The 12 Most Common Automation Bugs (And How to Build Workflows That Don't Break)

Every automation eventually breaks. The question isn't whether yours will fail — it's whether you'll know when it does, why it failed, and how fast you can fix it. After shipping hundreds of automations for B2B clients (and inheriting hundreds more in various states of disrepair), we've seen the same dozen bugs over and over.

This post is the catalog. For each bug: what it looks like, why it happens, and the pattern that prevents it. If you're building or maintaining automation, this is your pre-flight checklist.

1. Silent Failures

What it looks like. Workflow runs successfully, no errors, but nothing actually happens. Records aren't created. Notifications don't send. Sync doesn't update.

Why it happens. The workflow's logic conditions aren't matching what you think they are. An IF node evaluates false when you expected true. A filter excludes everything. A null value cascades through transformations without triggering an error.

The pattern that prevents it. Always log the intent of each branch, not just errors. At the end of every workflow, log a summary: "Processed 23 records, created 18, skipped 5 due to filter conditions." If the number of skipped records grows unexpectedly, you have a problem.

2. Infinite Loops Between Systems

What it looks like. Your sync workflow updates a record in System A, which triggers System A's change event, which fires the workflow, which updates System B, which triggers System B's change event, which fires the workflow. Repeat forever. Your Stripe API bill explodes.

Why it happens. Bi-directional sync between systems where both sides emit change events on update.

The pattern that prevents it. Use a "last sync timestamp" pattern. Before processing an inbound change, check the timestamp. If the record was modified within the last sync window (e.g., 5 minutes), skip the update — it's probably an echo of your own write. Alternative: maintain an integration user account, and skip any change events authored by that user.

We've personally debugged this against Stripe-Salesforce sync infinite loops more than once. Cost real money before we caught it. The timestamp pattern is non-negotiable.

3. Token and Auth Expiry

What it looks like. The workflow ran fine for months. Now every execution returns "401 Unauthorized" or "403 Forbidden."

Why it happens. OAuth tokens expire. Service account keys rotate. API credentials get revoked because someone hit "regenerate" in a settings panel. The workflow's stored credentials are no longer valid.

The pattern that prevents it. First, refresh tokens automatically where possible (n8n's OAuth credentials do this; some custom integrations need manual handling). Second, set up monitoring that alerts you when auth failures spike. Third, document credential expiry dates and rotation procedures.

4. Rate Limits

What it looks like. Most executions succeed. Sometimes you get "429 Too Many Requests." Errors cluster around specific times of day.

Why it happens. APIs have rate limits. Salesforce's API has org-level call limits. Stripe rate-limits aggressive API usage. HubSpot has rate tiers. When your workflow bursts through these limits, you get rate-limited.

The pattern that prevents it. Three layers. First, respect each API's rate limit explicitly in your workflow — Salesforce gives you a rate limit response header, use it. Second, implement exponential backoff on 429 errors. Third, for batch operations, throttle to a known-safe rate (we typically use batch size 5 with 500ms intervals for Stripe).

The hardest version: when one workflow is fine but you have 12 workflows all hitting the same API at the same time. Aggregate rate-limit budget across all your workflows.

5. Data Type Mismatches

What it looks like. "Cannot convert string to integer." "Invalid date format." Sometimes records succeed, sometimes they fail, depending on the input.

Why it happens. Source systems return data in inconsistent formats. A "phone number" field comes back as a string sometimes and as a number sometimes. A date comes as ISO 8601 in some records and as "MM/DD/YYYY" in others. Free-text fields contain characters that break downstream parsers.

The pattern that prevents it. Treat every external data input as untrusted. Explicitly cast types. Validate against expected schemas. Normalize formats (dates always to ISO 8601, phone numbers always to E.164) before processing. Use a "data normalization" node early in every workflow.

6. Timezone Bugs

What it looks like. Workflow scheduled to run at 9 AM Eastern actually runs at 6 AM Eastern. Reports cover the wrong date range. Stripe invoices generated at midnight UTC look like they belong to yesterday's records, not today's.

Why it happens. Systems disagree on timezones. Your n8n server runs in UTC. Your CRM stores timestamps in user's local timezone. Your billing system uses the customer's account timezone. Without explicit handling, you get drift.

The pattern that prevents it. Always store and process timestamps in UTC internally. Convert to local timezone only at display layer. Document the timezone of every external system you integrate with. For scheduled triggers, explicitly set the timezone in your workflow configuration.

7. Duplicate Processing

What it looks like. Same lead enters Salesforce twice. Same customer charged twice. Same Slack message sent to a channel three times in a row.

Why it happens. Workflows retry on transient failures, sometimes succeeding the second time even though the first time also succeeded. Webhook deliveries get retried by source systems. Two workflows watching the same trigger both fire.

The pattern that prevents it. Idempotency. Use a unique ID from the source system (Salesforce Event ID, Stripe Event ID, etc.) and track which IDs you've already processed. Before acting, check the dedup store. After processing, write the ID to it. We typically use Redis or a simple Postgres table for this.

For Google Sheets triggers specifically (which can be flaky), we've replaced the native trigger with a polling pattern that uses Static Data to dedupe — same approach.

8. Missing Error Handlers

What it looks like. Your error workflow doesn't fire because there isn't one. A workflow fails. You discover the failure 3 weeks later when someone notices a missing invoice.

Why it happens. Most workflow platforms make it easy to build the happy path and harder to build error handling. Builders skip the error path because the workflow works the first 10 times they test it.

The pattern that prevents it. Every production workflow should have an error trigger workflow that fires on failure. That error workflow should classify the severity, log to a central error queue, alert the workflow owner in Slack with the execution URL, and (for repeated failures) create a Jira ticket.

n8n has a built-in Error Workflow setting that any workflow can point to. Use it on every production workflow.

9. Hardcoded References

What it looks like. "Sarah" left the company. Now your lead routing assigns to a deactivated user. "Acme Inc" got renamed to "Acme Holdings." Your matching breaks.

Why it happens. Builders hardcode user IDs, Account names, channel IDs, or other references inside the workflow. When the underlying entity changes, the workflow doesn't know.

The pattern that prevents it. Use external configuration. Store rotation lists in a Google Sheet or Notion database. Store channel routing in a config table. Reference users by their canonical ID, not their name. When you do hardcode (sometimes you have to), document it in workflow notes and add a quarterly review.

10. Race Conditions

What it looks like. Round-robin assignment sometimes assigns two leads to the same rep at the same time. Sequence numbers skip. Counts get off.

Why it happens. Two workflow executions are running simultaneously, both reading the same shared state, both updating it, both finishing — with the second overwriting the first.

The pattern that prevents it. Atomic updates. If you're maintaining a counter (like a round-robin rotation pointer), use an atomic increment operation — not a read-modify-write pattern. Most databases support atomic increments natively. Google Sheets does not, so for round-robin we typically use a Postgres table or Notion database with proper concurrency handling.

For high-volume workflows, consider a queue pattern: leads land in a queue, a single worker processes them sequentially, no concurrency issues.

11. Schema Drift

What it looks like. Workflow ran fine yesterday. Today it's throwing "field not found" errors. Someone added or renamed a Salesforce field, deleted a HubSpot property, or restructured a Notion database.

Why it happens. Source system schemas change. Admins add fields. Engineering renames columns. Nobody told your workflow.

The pattern that prevents it. Test your critical workflows against a sandbox before deploying. Set up schema validation: at the start of each workflow, verify that expected fields exist and have expected types. Alert on schema mismatches. Build a quarterly "schema audit" process that checks your workflows against current source system schemas.

12. Monitoring Gaps

What it looks like. A workflow has been failing for two weeks. Nobody noticed. The reason it wasn't caught is that the error workflow itself was also broken, or alerts were going to a Slack channel that nobody monitors.

Why it happens. Monitoring is the boring last step that gets skipped. The setup is "we'll know if there's a problem" — until there's a problem and you don't know.

The pattern that prevents it. Three layers of monitoring. First, in-platform monitoring — every workflow has an error trigger pointing to a central error handler. Second, external uptime monitoring — a service like Better Uptime or Uptime Kuma pings your workflows on a schedule and alerts if they're not responding. Third, business-level monitoring — a daily check that compares expected vs actual workflow output (e.g., "we usually invoice 20-30 deals per day; today we invoiced 0; investigate").

The Meta-Pattern: Build for Failure

Every workflow you ship will fail. The question is how gracefully. The patterns above all share a common theme: assume each step can fail, log enough information to debug it, fail loudly to the right person, and have a fallback path.

The teams that build the most reliable automation aren't the ones with the smartest builders. They're the ones who treat failure as a design requirement, not an edge case.

Is Bulletproof Automation Right for Your Team?

If you're building automation that touches customer-facing systems, financial systems, or anything that affects revenue, yes — the patterns above are non-negotiable. Cutting corners on reliability is what turns a working automation into a 2 AM rollback incident.

If you're building experimental, internal-only automation for low-stakes work, simpler is fine. Skip the monitoring infrastructure. Build the happy path. Iterate.

At Ops Automators, we build production-grade automation for B2B teams as our entire business. Every workflow we ship includes monitoring, error handling, and the patterns above by default. If you've inherited a brittle automation stack — or you're building one and want to ship it right the first time — that's our job.

Ready to automate? Book a free discovery call and we'll review your automation reliability posture.