Deduping in Salesforce: The complete how-to

Don’t be duped by Salesforce.com duplicate records. Deduping seems like one of those routine system maintenance issues, but it’s actually very tricky and there's no undo.

1 2 3 4 5 6 7 Page 5
Page 5 of 7

Dealing with individual dupes

It would be nice if fixing just a few duplicate records were as easy as fixing simple data quality problems.  But dupes aren’t a simple problem because they involve an identity crisis:  which of the dopplegangers has better data?  Which of the pair (or trio) should be the surviving record?  What are the repercussions of deduping?

The most important things to remember:

  • There is no “undo” from a bogus merge.  It’s manual reconstructive surgery and some data will be irretrievably lost.  If you’re lucky, that data won’t matter.
  • Merging a contact is pretty low-risk, and merging leads is even safer. 
  • Merging an account is much more likely to entail real risk, multiplied exponentially by any external integrations you have with your system.  There are many cases where merging accounts isn’t the right thing to do and there are several reasonable alternative solution strategies (as discussed above).  So…be really sure you need to go down the merge path!
  • If you possibly can, do merges in a sandbox first, so you can see the first-level repercussions before you do it with live data.  Even if you’ve done it in a sandbox first, do some test merges (with junk data) in the production system to make sure that systems integrated with your CRM aren’t going to react in nasty ways.
  • A merge in SFDC preserves all “related lists” (child or related records) from both of the merged records.  But for the fields within the two records being merged, only one value per field can be preserved.  If you need to preserve or concatenate data from the “victim” record into the surviving one, you’ll have to do that manually after the fact (which means you need to be prepared before the merge).

Almost inevitably, there will be some information in the about-to-be-destroyed record that you might want to have around after the merge, and the first step is to figure out what those fields might be.  Create a new field in the table called "Legacy Data" of the LONG TEXT data type for the SFDC table in question. You'll need to add this new field to at least one page layout for each object. For sure, the old record ID, owner, stage/status, type, phone number, and lead source are needed...but you should ask your users what else they want (such as mailing address).

While they're figuring that out, use the Data Loader or your favorite tool to grab all the records of that object and put it in a spreadsheet.  Make sure to change niggling things like description, phone, and other text fields to TEXT data type and the Zip code into ZIP CODE data type before manipulating them in Excel. Create a formula in Excel that concatenates all these fields in an intelligent way (like: ="Old ID = "&ID &" | "& Old Owner = "&Owner.Name& " | "....). 

Once you've got that right, push the resulting values into the Legacy_Data__c field you created in SFDC.  If you want to be prepared for the future, you can create a workflow to do this on new records, just make sure that the workflow rule only runs when Legacy_Data__c is blank, so that you don't scribble over existing information.

For the moment, Salesforce’s classic UI has more flexibility in its merging features than Lightning does.  At some point, Lightning will be better, but since you need to do your merging today, I’m going to describe how to get things done in classic.

In the system’s data import wizards, there's a way to look for duplicates based on email address and other items. In addition, there is the built-in duplicate detection/prevention feature that can be set up with clever parameters to block (or at least flag) dupes.  These work fine, but you'll find some dupes still get through (for example, “GM” vs “General Motors” or “John Jones” vs “Jack Jones”).  

Finding the onesy-twosy dupes is typically done with spreadsheets and formula fields, looking for things like string matches on email address, phone, company name or website URL.  If things get subtle, you’ll want to use Microsoft’s free fuzzy matching tool (described here).

Deduping Leads

  • Open a notepad or textedit instance on your desktop.  You’ll need this for a scratch pad.
  • Log the names and ID numbers of the records you’re going to merge and the reason why.  You will want this “intention log” when somebody invariably questions why their record disappeared. 
  • Get to one of the dupe Leads by using SFDC’s search bar. 
  • Look near the top of the page for the button labeled “Find Duplicates” and push it.  (If the button isn’t there, put it in the Lead page layout and make sure you have the “delete” and “modify all data” permissions on the Lead object.)
  • The first step of the Find wizard lets you select which fields to match on.  Adjust to your needs.
  • You’ll be presented with a list of potentially matching Leads, Contacts, Accounts and Opportunities.  These are all helpful for context, but the only thing you can actually merge from this page will be Leads.
  • Select the Leads (two or three of them) you want to merge and click the “Merge Leads” button.
  • You’ll be presented with two or three columns of data, showing the fields that are the same and those that are different between/among the records. 
  • The first row indicates which will be the Master (surviving) record.  Typically, choose the record that was created first, or at least created by the most relevant person.
  • In other rows, if you’re not given the choice, the data is bitwise identical.  If you are given a choice, choose the best data.
  • Select the Legacy Data field from the LOSING (about to be wiped out) record, not the master.
  • What if you need to preserve both values (e.g., a second mobile phone number or third email address)?  Copy/paste the values you need to preserve into your scratch pad.
  • Once you’re really sure you’ve selected the right things, hit the Merge button.
  • Edit the surviving Lead if needed to paste the data from your scratch pad into the relevant field(s).  Don’t forget to hit the Save button in SFDC.
  • Save your scratch-pad/intention log under a unique name and put it somewhere safe.
  • But…but…what if you wanted to merge a Lead into a Contact?  In that case, you’ll need to NOT hit the Merge Lead button and instead go through the Convert Lead wizard and then merge the resulting Contact afterwards.

Deduping Contacts

  • As with deduping Leads, you'll want to open a notepad or textedit instance on your desktop to use as a scratch pad.
  • Again, log the names and ID numbers of the records you’re going to merge, and the reason why.
  • Using SFDC’s search bar to navigate to one of the dupe Contacts.  Click on the record’s Account field to take you up to that level.
  • Hopefully, all of the Contacts you want to merge are listed under that Account.  If not, you’ll have to navigate to those other Contacts and move them (one by one) to the Account you’ll be working with.  Simple, huh?
  • From the Account page (that now contains all the Contacts you want), click on the “Merge Contacts” button.  (If the button isn’t there, put it in the Account page layout and make sure you have the “delete” and “modify all data” permissions on the Contact object.)
  • The first step of the wizard lets you select which Contacts (two or three of them) you want to merge.  Do that and click the “Next” button.
  • You’ll be presented with two or three columns of data, showing the fields that are the same and those that are different between/among the records. 
  • The first row indicates which will be the Master (surviving) record.  Typically, choose the record that was created first, or at least created by the most relevant person.
  • In other rows, if you’re not given the choice, the data is bitwise identical.  If you are given a choice, choose the best data.
  • Select the Legacy Data field from the LOSING (about to be wiped out) record, not the master.
  • What if you need to preserve both values (e.g., a second mobile phone number or third email address)?  Copy/paste the values you need to preserve into your scratch pad.
  • Once you’re really sure you’ve selected the right things, hit the Merge button.
  • Edit the surviving Contact if needed to paste the data from your scratch pad into the relevant field(s).  Don’t forget to hit the Save on the Contact.
  • Save your scratch-pad/intention log under a unique name and put it somewhere safe.

Deduping Accounts

  • Follow the steps above for opening a scratch pad, logging names and IDs and explanations.
  • Go the Accounts’ “home page” and scroll to the bottom of that overview page.  Look in the “Tools” section for the “Merge Accounts” link.  Click on that.
  • The first step of the wizard lets you search for companies.  The only way you can do that is by name.  If the dupes have different names, you’ll need to rename the “victim” (non-surviving) company so it’s similar enough to be found with a simple string search...but keep the names different enough so you know which should be the master.  Click the 'Find' button.
  • The next step of the wizard lets you select which Accounts you want to merge.  Do that and click the “Next” button.
  • You’ll be presented with two or three columns of data, showing the fields that are the same and those that are different between/among the records. 
  • The first row indicates which will be the Master (surviving) record.  Typically, choose the record that was created first, or at least created by the most relevant person.
  • In other rows, if you’re not given the choice, the data is bitwise identical.  If you are given a choice, choose the best data.
  • Select the Legacy Data field from the LOSING (about to be wiped out) record, not the master.
  • What if you need to preserve both values (e.g., notes or business description)?  Copy/paste the values you need to preserve into your scratch pad.
  • Once you’re really sure you’ve done the right things, hit the Merge button.
  • If needed, edit the surviving Account to paste the data from your scratch pad into the relevant field(s).  Don’t forget to  Save the Account when you’re done.
  • Save your scratch-pad/intention log under a unique name and put it somewhere safe.

Deduping other tables

SFDC doesn’t provide wizards for deduping other objects, but with a small amount of code (cut/pasted into SFDC’s Developer Console and its lovely Execute Anonymous window), you can leverage SFDC’s merge DML operation, which works for Cases, Opportunities, Solutions, and Tasks.  The merge call is pretty limited (it doesn’t give you any field-by-field selection, just letting you choose the surviving record), so you’ll definitely want to have a full backup copy of the victim record in your scratch pad so you can edit any fields where the surviving record didn’t hold the best data.

For tables that go beyond those listed above, you’ll have to figure out a different strategy:  true merges aren’t available, but there are lots of tricks you can do with clever editing and DML.  Have fun with that (but be careful!).

1 2 3 4 5 6 7 Page 5
Page 5 of 7
Download CIO's Winter 2021 digital issue: Supercharging IT innovation