The Data Quality Playbook: Cleaning Your CRM Before You Automate Anything
You can't automate your way out of bad data. Here's the playbook we use to cleanse a CRM before building automation — duplicate cleanup, field normalization, enrichment, and the workflows that keep it clean.
The biggest reason automation projects fail isn't the automation. It's the data the automation runs on. A workflow that routes leads by Country breaks the moment 40% of your leads don't have Country filled in. A scoring model fails when half your contacts don't have Industry. An invoice automation hallucinates when Account names don't match between systems.
You cannot automate your way out of bad data. The cleanup has to happen first.
IBM estimates poor data quality costs U.S. companies over $3 trillion annually. That number is so big it's hard to feel. The more useful version: in any given B2B company, we typically find 15-40% of CRM records have data quality issues serious enough to break downstream automation. This post is the playbook we run to fix that before we build anything.
What "Bad CRM Data" Actually Means
There are five distinct categories of CRM data quality problems. Each needs different treatment.
1. Duplicates. Two contact records for the same person. Two account records for the same company. Two opportunities for the same deal. Duplicates cause double-emails, conflicting ownership, and reporting that overcounts.
2. Missing values. Records where required fields are blank. Country empty on 40% of leads. Industry unknown on 60% of accounts. Phone number missing on half your contacts.
3. Inconsistent formats. Country sometimes stored as "USA," sometimes as "United States," sometimes as "US." Phone numbers in 12 different formats. Industry values spanning 200 variations of the same concept ("SaaS" vs "Software-as-a-Service" vs "B2B Software").
4. Stale data. Contact information that was correct two years ago and isn't anymore. Job titles for people who switched companies. Email addresses that have bounced.
5. Wrong-system data. Account-level information stored at the contact level. Notes that should be tasks. Deal information that lives in custom fields when there's a built-in field for it. The data is technically there but in the wrong place.
A clean CRM is one that has acceptable rates across all five — not one that's perfect on a single dimension.
The 6-Phase Cleansing Playbook
Here's the order we run cleansing in. Sequence matters — deduplicating before normalizing creates more duplicates after normalization, etc.
Phase 1: Audit (Days 1-3)
Before you cleanse anything, measure what you have. The audit identifies what's broken and how badly.
The reports we run:
- Record count by object (Contacts, Accounts, Opportunities, Leads).
- Records with missing values for each required field, as a % of total.
- Records with values that don't match validation rules.
- Duplicate analysis: matching by email exact, by email fuzzy, by company name fuzzy.
- Activity recency: when was each record last modified, last engaged with.
- Ownership analysis: how many records are owned by inactive users.
- Field usage analysis: how many records have a non-null value for each custom field (the answer for half your custom fields will be "less than 5%").
Deliverable: a "Data Quality Health Report" with concrete numbers and target metrics for each category.
Phase 2: Deduplication (Days 4-10)
Always cleanse duplicates first. Other cleansing operations create more duplicates if you don't.
The process:
- Run exact-match dedupe (matching email). Auto-merge with rules: keep the older record, but pull non-null values from the newer record.
- Run fuzzy-match dedupe on company name and domain. Manually review the matches before merging — fuzzy matching has false positives.
- For accounts: identify parent-subsidiary relationships that have been incorrectly stored as separate accounts. Either link them via Parent Account or merge based on policy.
- For opportunities: identify duplicate deals (same contact, same product, similar amount, similar dates) and merge or close.
Tools we use: Salesforce native dedupe rules, HubSpot's dedupe tool, plus a third-party (Insycle, DemandTools, Dedupely) for fuzzy matching at scale.
Phase 3: Format Normalization (Days 8-14)
Once duplicates are clean, normalize the fields that drive automation.
The fields that always need normalization:
- Country. Map every variation to a standard list (ISO country codes or your own short list).
- State / Province. Standardize to 2-letter codes or full names — pick one.
- Phone number. Apply E.164 format (e.g., +1-555-123-4567) or your standard.
- Industry. Map to a controlled vocabulary (your custom list, or a standard like NAICS / SIC).
- Job title. Group into levels (C-level, VP, Director, Manager, IC) and functions (Sales, Marketing, Engineering, etc.).
- Company name. Standardize legal suffixes (Inc., LLC, Ltd.), strip extraneous characters, fix capitalization.
The fastest way to normalize is a one-time bulk update via Data Loader or HubSpot's bulk edit. The right way is a one-time bulk update plus automated normalization on every future create/update event. We'll cover the automation in Phase 6.
Phase 4: Enrichment (Days 12-21)
Fill in the missing data with external sources.
The enrichment moves:
- Domain → company information. Pull industry, company size, location, technologies in use from Apollo, Clearbit, ZoomInfo, or Crustdata.
- Email → person information. Verify the email is valid (NeverBounce, ZeroBounce), then pull current job title, LinkedIn, tenure.
- Company name → domain. For records where you have the company but not the website, reverse-lookup via Crunchbase, LinkedIn, or Apollo.
Run enrichment in batches against your real-time API budget. For a CRM with 50,000 records, enrichment typically costs $500-$3,000 depending on the vendor and depth.
Phase 5: Stale Data Triage (Days 18-25)
Decide what to do with records that haven't engaged in a long time.
The three options for stale records:
- Archive — move to a "cold storage" object or status, exclude from active reporting and automation.
- Re-engage — run a re-permission campaign or "still interested?" email before deleting.
- Delete — for records that have no activity in 3+ years, no opens in 12+ months, and no relevance to current business.
Don't delete leads en masse. Leads might be cold today, hot in 18 months. Archive instead.
For Contacts attached to active Accounts, almost never delete — the relationship matters even if the individual hasn't engaged.
Phase 6: Ongoing Hygiene Automation (Days 22-30)
The cleanse is one-time. The maintenance is forever. Build the automation that prevents the bad data from coming back.
The automation we always ship:
- Form-time normalization. Every form submission triggers a workflow that normalizes phone, country, state, and other key fields before the record is created.
- Enrichment-on-create. New records auto-enriched against your domain/email enrichment service. Missing key fields filled in.
- Duplicate prevention. Real-time duplicate detection on Lead/Contact create. Block the duplicate or merge automatically based on rules.
- Field validation rules. Required fields stay required. Format rules catch bad data at entry.
- Monthly data quality digest. Automated workflow that runs the Phase 1 audit reports and posts the results to a #revops channel in Slack. Visibility prevents drift.
- Ownership reassignment automation. When a user is deactivated, their records auto-reassign based on rules (territory, account ownership chains, manager).
The Tools
For Salesforce: Native Dedupe Rules and Validation Rules for the basics. DemandTools or Cloudingo for bulk operations. Apollo or Clearbit for enrichment. Talend or Salesforce Data Pipelines for ongoing normalization.
For HubSpot: Native deduplication and the Format Data action (Operations Hub). Insycle for advanced dedupe and bulk operations. Apollo or Clearbit for enrichment. Operations Hub's data quality tools at the Pro/Enterprise tier.
For everyone: n8n or Zapier for the cross-system automation that ties it together.
The Three Most Common Mistakes
1. Cleansing without automation behind it. A one-time cleanse without ongoing prevention rebuilds the mess in 6-12 months. Always build the maintenance automation as part of the cleansing project.
2. Deleting before archiving. Once a record is deleted in Salesforce, recovering it is painful. Archive first, monitor for problems, delete after 90+ days if nothing depends on the archived records.
3. Normalizing without a controlled vocabulary. Picking "let's standardize Industry" without defining the destination list creates a new mess. Define the target values before you start mapping.
What "Good" Data Quality Looks Like
The metrics we target for a clean B2B CRM:
- Duplicate rate under 3% on Contacts and Accounts.
- Required fields populated on 95%+ of active records.
- Country, State, Industry normalized to controlled vocabulary on 90%+ of records.
- Email bounce rate under 5%.
- Ownership: 100% of active records have an active user as owner.
These numbers won't be hit on Day 1 of automation. They're the target you steer toward over 90-180 days.
Is a CRM Data Cleansing Project Right for Your Team?
If you're about to start an automation project — or if your current automation is producing weird results, missing records, or unreliable reports — yes. Do this first. Every dollar spent on cleansing saves three dollars on automation rework.
If your CRM is genuinely clean (rare, but it happens), skip to the automation. Build the hygiene workflows from Phase 6 anyway, as preventive maintenance.
At Ops Automators, we cleanse CRMs as part of our automation projects, not as a separate engagement. The data work is invisible if we don't talk about it, but it's the reason our automation actually works. If you're looking at a CRM you don't trust, that's exactly the work we do.
Ready to automate? Book a free discovery call and we'll audit your CRM data quality.
Related reading: Why Most CRM Implementations Fail (And How to Avoid It) · The Complete HubSpot Workflow Automation Playbook · How to Calculate the True Cost of Manual Data Entry
Want us to automate this for you?
Book a 30-minute discovery call — no pressure, no commitment.