Python for Salesforce Real-Life Challenge: NPSP Email Deduping

12 Nov 2018 🔖 salesforce python questions
💬 EN

Answer
Salesforce NPSP: How Email Addresses Work
De-duping named email Fields within a single record
De-duping named email fields across records

Salesforce #AwesomeAdmin, reader, and “Office Hours” attendee Matthew (@matthewsalibi) got Python up and running on his computer, and is ready to dive into looking for duplicates in his Salesforce “Contacts” table. But he’s on the Nonprofit Starter Pack (NPSP), and a given email address might be in any of several fields. Today we’ll lay out his question … stay tuned for the answer!

Thanks, Matthew, for submitting the questions below (and for teaching me not only that Janet the robot has a last name, 😲 but that that there’s a “load known duplicates from a CSV file” option in DemandTools – that is going to save me hours): 🙌

Answer

Click here for the answer to this problem

Salesforce NPSP: How Email Addresses Work

In NPSP (NPSP=Nonprofit Starter Pack, the open-source package that runs 27,000 nonprofits!), there are three named email fields, which, combined with a “Preferred Email” picklist, populate the default Salesforce email field (using Apex code):

(So, if I populate the PersonalEmail field and set the picklist value to Personal, an NPSP trigger writes the PersonalEmail value to the default Email field. The consequence of this is just basically that we ignore the default email field in NPSP, since it’s always populated by automation.)

De-duping named email Fields within a single record

Using Python, it would be cool to be able to look across these named email fields, find places where the same email address is duplicated within a single record, and null out the non-preferred version.

So, taking a dataset like:

FirstName	LastName	Preferred Email	PersonalEmail	WorkEmail	AlternateEmail
Chidi	Anagonye	Personal	[email protected]	[email protected]	[email protected]
Tahani	Al-Jamil	Personal	[email protected]	[email protected]	[email protected]
Jason	Mendoza	Work	[email protected]	[email protected]	[email protected]

We could output a version where the duplicates that are not in the “preferred” version of the field are nulled out:

FirstName	LastName	Preferred Email	PersonalEmail	WorkEmail	AlternateEmail
Chidi	Anagonye	Personal	[email protected]		[email protected]
Tahani	Al-Jamil	Personal	[email protected]	[email protected]
Jason	Mendoza	Work		[email protected]

If it’s possible to only output changed rows, that would be extra-awesome!

De-duping named email fields across records

One limitation of native Salesforce duplicate management, and also of DemandTools, is that you can only check a field against itself; that is, you can only compare the PersonalEmail field on one record to the PersonalEmail field on another record. So, if [email protected] is in the PersonalEmail field on one record but in the WorkEmail field on a duplicated record, you could not easily see that using DemandTools or native duplicate record jobs.

So, taking a data set like:

SalesforceID	Human-Readable ID	FirstName	LastName	PersonalEmail	WorkEmail	AlternateEmail
003…01	BBBB	Chidi	Anagonye	[email protected]
003…02	DDDD	Eleanor	Shellstrop	[email protected]
003…03	EEEE	Jason	Mendoza	[email protected]
003…04	FFFF	Eleanor	Shellstrop	[email protected]		[email protected]
003…05	GGGG	Janet	Della-Denunzio	[email protected]		[email protected]
003…06	HHHH	Tahani	Al-Jamil	[email protected]
003…07	IIII	Chidi	Anagonye		[email protected]
003…08	JJJJ	Mindy	St. Claire	[email protected]
003…09	KKKK	Kamilah	Al-Jamil	[email protected]
003…10	AAAA	Eleanor	Shellstrop			[email protected]
003…11	CCCC	Ellie	Shellstrop		[email protected]

We should:

Check across the named email fields for duplicates
For identified duplicates, check the first names
If the first names are an exact match, output a file for DemandTools with the Salesforce IDs that correspond to the (alphabetically) lowest Human-Readable ID in the “Master” column, and any duplicates’ Salesforce IDs in “duplicate” columns, like this:

Master	Duplicate1	Duplicate2
003…10	003…02	003…04
003…01	003…07

This produces a file that is ready to go into DemandTools for single table dedupe using the advanced field-level merge criteria we’ve already built, woohoo!

For those records where a duplicate email is found, but there is no FirstName match, it would be great to output an exception report, that looks like this:

Duplicated Email	Master	MasterFirstName	Master LastName	Duplicate1	Duplicate1FirstName	Duplicate1LastName
[email protected]	003…03	Jason	Mendoza	003…05	Janet	Della-Denunzio
[email protected]	003…10	Eleanor	Shellstrop	003…11	Ellie	Shellstrop

A human user can look at this exception report, and guess that on line 2, what’s likely happened is that Janet sometimes uses Jason’s email, but the email is more closely identified with Jason (because the email is “jm@”, and Jason’s last name starts with “M”). The user may then decide to remove the address from Janet’s record.

Looking at row 3, a human user may decide that those records are true duplicates. The user could review this output file, mark the “true dupes” and sort them out, then remove all columns except for “Master” and “Duplicate”, and feed that to DemandTools to complete a de-dupe using an existing DemandTools scenario.

Tweet Share on LinkedIn

Previous Post Next Post

--- ---

Salesforce, Python, SQL, & other ways to put your data where you need it

Need event music? 🎸

Python for Salesforce Real-Life Challenge: NPSP Email Deduping

Table of Contents

Answer

Salesforce NPSP: How Email Addresses Work

De-duping named email Fields within a single record

De-duping named email fields across records