Salesforce, Python, SQL, & other ways to put your data where you need it

Need event music? 🎸

Live and recorded jazz, pop, and meditative music for your virtual conference / Zoom wedding / yoga class / private party with quality sound and a smooth technical experience

Better Salesforce Insert/Update Operations with Jitterbit Caching

28 Oct 2019 🔖 salesforce jitterbit integration tutorials tips
💬 EN

Post Header Image

Table of Contents

Combined with “external ID” fields, Salesforce’s “upsert” data-loading option is a godsend for dumping data from legacy systems into Salesforce.

But what do you do if you only want to insert new records or only want to update existing records, using the external ID as an inter-system matching key, considering Salesforce’s “insert” and “update” operations don’t support matching on external IDs?

The Jitterbit ETL tool makes it relatively easy to use a SOQL query ahead of an insert or update operation and use data from the query to filter which “upload candidate” records actually get sent to Salesforce. The trick is to write code in Jitterbit’s scripting language.


TL;DR for experienced Jitterbit programmers

This is a step-by-step tutorial showing how to write small bits of code that use the Jitterbit execution environment’s memory to save a hashmap of IDs from both sides of an ETL process that’s about to run, and then to use your hashmap at data-upload-time on a record-by-record basis.

Skip to the parts that mention “<trans>”, the “Operation Script” sections, and the pro-tips / further considerations sections for the goodies.

This does not take the place of native Jitterbit “diff” tooling if you need field-by-field comparisons. This is just a simple “record exists?” check. However, it’s far less fragile and prone to needing resets than the “Jitterbit diff” tooling if “record exists?” is the only information you need for branching logic.


Today’s Data

In My Source Database

My “data source” is an ordinary Oracle database whose People table looks like this when I run the following SQL query:

SELECT Id_Column, First_Name, Last_Name, Phone_Number, Email_Address
FROM People
Id_Column First_Name Last_Name Phone_Number Email_Address
ERP10006 Darweesh Daher 444-444-4444 [email protected]
ERP17685 Helen Hopper 888-888-8888 [email protected]
ERP33014 Grace Gao 777-777-7777 [email protected]
ERP71892 Cathy Combs 333-333-3333 [email protected]

Screenshot

In Salesforce

My “data destination” is a Salesforce org whose Contact table looks like this when I run the following SOQL query:

SELECT Id, Name, My_External_Id__c, Phone, Email
FROM Contact
Id Name My_External_Id__c Phone Email
003010101010101 Anush Amjit ERP28190    
003020202020202 Benita Borges      
003030303030303 Cathy Combs ERP71892    
003040404040404 Darweesh Daher ERP10006    
003050505050505 Ezra Ellis      
003060606060606 Frances Fulvia      

Screenshot

Yes, Someone’s Missing

Note that even though Salesforce believes there’s a record in my “external data source” with ID ERP28190 named Amjit Anush, it seems that Amjit has been deleted from my external database.

Dealing with that data integrity issue is another problem for another Jitterbit post.


ETL Business Rules

Today we’re going to deal with some pretty unrealistic business rules, for example’s sake.

Update

  1. Whenever someone exists in both databases, we want to feed their Phone Number into Salesforce.
  2. We’ll leave First Name, Last Name, and E-Mail Address alone upon record updates.

In other words, if we were to do an update right now, we’d expect to populate Cathy’s and Darweesh’s phone numbers but not their e-mail addresses.

Insert

  1. Whenever someone exists in Oracle but not in Salesforce, we want to feed their First Name, Last Name, External ID, and E-Mail Address into Salesforce.
  2. We’ll leave Phone Number alone on upon record inserts.

In other words, if we were to do an insert right now, we’d expect to add new records for Grace and Helen to Salesforce (both records having blank phone numbers but non-blank e-mail addresses in Salesforce).


Jitterbit: Creating A Dictionary

What The Dictionary Will Look Like

Ultimately, what we’re going to do today is run the following SOQL query:

SELECT Id, My_External_Id__c
FROM Contact
WHERE My_External_Id__c <> NULL

Which will give us data that looks like this:

Id My_External_Id__c
003010101010101 ERP28190
003030303030303 ERP71892
003040404040404 ERP10006

We’ll transform and store that data into a data structure called a dictionary for use as a “cache” of values we can reference within Jitterbit without running another SOQL query.

The contents of our dictionary will look like this:

  • ERP10006: 003040404040404
  • ERP28190: 003010101010101
  • ERP71892: 003030303030303

Note: If you’ve ever done Apex programming within Salesforce, you may have written similar code where you looped over the results of a SOQL query and used them to fill a Map<Id,Id> with values for later reference.


Declare A Variable

First, declare a Jitterbit “global variable.” I’m going to call mine “existingContactIdsByExtId.”

Screenshot


Tell Jitterbit How To Log Into Salesforce

Before I write any code to populate existingContactIdsByExtId, I need to tell Jitterbit how to talk to my Salesforce, so I’ll create a “Salesforce Org” object in Jitterbit and give it my login credentials:

Screenshot


Populate The Variable

Now I’ll create a new “Script” and call it “Lazy-Populate existingContactIdsByExtId.”

Here’s the Jitterbit-script code I’ll put into it:

<trans>
If( IsNull( $existingContactIdsByExtId) || $existingContactIdsByExtId == "" ,
	WriteToOperationLog( "$existingContactIdsByExtId was null -- building it" );
	arr = SfLookupAll(
		"<TAG>Salesforce Orgs/[email protected]</TAG>"
		,"SELECT My_External_Id__c, Id FROM Contact WHERE My_External_Id__c <> null"
	);
	$existingContactIdsByExtId = Dict();
	i = 0;
	While( i < Length(arr) ,
		$existingContactIdsByExtId[arr[i][0]] = arr[i][1];
		i = i + 1;
	);
);
</trans>

Screenshot

Code Notes

Lazy Code

Note that this script is “lazy.”

That means that if we have already put data into existingContactIdsByExtId during the runtime of a Jitterbit operation, a 2nd+ run of this script would skip running all code within the If() statement.

What Our Data Looks Like

Although Jitterbit doesn’t show us what the contents of the variable arr look like when we test-run our script, here’s what it would look like if we could get it to display (I have my tricks…):

{{ERP28190, 003010101010101}, {ERP71892, 003030303030303}, {ERP10006, 003040404040404}}

And existingContactIdsByExtId looks like this:

[ERP10006=>"003040404040404",ERP28190=>"003010101010101",ERP71892=>"003030303030303"]

Jitterbit: Updating Salesforce

Update Operation

Next, I used Jitterbit to create an ordinary Salesforce Update operation that queries my Oracle database as a source.

I’ll trust that you know how to do that in Jitterbit (or are comfortable figuring out how to do it … or work with someone who is) and are here to learn my “caching” trick.

I named my operation “Update Contacts By External ID.”

Transformation

Mapping: Phone

Once you’ve set that up, in your Salesforce operation details screen, click the “Edit” button by “Mappings” to bring up the transformation involved in the operation (for me, it’s called “Update Contacts By External ID Request“).

Drag Phone_Number at left onto Phone at right.

Condition (execute vs. skip record)

Now we have to tell Jitterbit to detect that Grace and Helen aren’t in Salesforce and skip over trying to update their Salesforce records.

Right-click the [E*] Contact folder at top and click “Add Condition.”

Screenshot

A new node will appear at right just below the “Contact” folder called “Condition.”

Screenshot

Double-click it. A “Formula Builder” window will appear for editing the node named “root$transaction.body$update$Contact.Condition.”

Screenshot

At left, between the <trans> and </trans> tags, type:

HasKey($existingContactIdsByExtId, ID_COLUMN);

(Note that my Oracle database upper-cases column names when queried, so what I was previously calling Id_Column needs to be referred to in this “Jitterbit transformation formula” as ID_COLUMN.)

If I click the “Test” button, I can see that the result is 0 (false), indicating that ID ERP10006 (Darweesh Daher) is in existingContactIdsByExtId.

Screenshot

Hmmm. That’s not right. Darweesh actually is in my Salesforce Contact table with ERP10006 on file.

But the Jitterbit “Test” button doesn’t know that. As far as it’s concerned, existingContactIdsByExtId is just an empty variable.

To test our HasKey() code and ensure that we wrote it correctly, we can add 2 lines to the “Condition” formula so that it runs our script and populates existingContactIdsByExtId with data:

$existingContactIdsByExtId = '';
RunScript("<TAG>Scripts/Lazy-Populate existingContactIdsByExtId</TAG>");
HasKey($existingContactIdsByExtId, ID_COLUMN);

That’s better: now the answer is 1 (true). This means that that Darweesh’s record will be processed when we run our Salesforce Update operation.

Screenshot

Just keep in mind that we don’t actually want to leave those extra $existingContactIdsByExtId = '' or RunScript() lines of code in our condition once we’re done testing it.

It’s wasteful of Jitterbit’s CPU time (and hence our real-world time) – particularly if we have a lot of rows in our database query.

There’s no real-world need to try to “populate existingContactIdsByExtId” every time we try to load a new row of data from our database into Salesforce. It’s much better off being done just before the entire “Update” operation kicks off as a whole.

Next, in the upper-right area, I’m going to click the little “1 of 4” next to the “Manual SQL” folder under the “Source” folder, which should flip it to say “2 of 4” and show ID ERP17685’s (Helen Hopper’s) data instead.

I’ll hit “Test” once again to validate:

Screenshot

Satisfied that my condition lets Darweesh through but skips over Helen, I’ll delete my “testing-only” lines of code so that my formula simply reads as follows:

<trans>
HasKey($existingContactIdsByExtId, ID_COLUMN);
</trans>

I click “OK” to close the formula editor, then immediately click the “Save” button at top right area of my “Transformation” editor pane.

Mapping: ID

I have one more job to do before I can save, deploy, and close my “Transformation”: since this is an Update operation, I need to specify a Contact ID for each row that I’m updating in Salesforce.

My Oracle database has no idea what the answer to this question is (it just thinks of Darweesh as ERP10006), but luckily, Jitterbit knows.

It just has to ask existingContactIdsByExtId what the Salesforce Contact ID for ERP10006 is (in this case, 003040404040404).

I double-click on “Id” at right to bring up its formula editor, setting the code between the <trans> and </trans> tags to be:

$existingContactIdsByExtId[ID_COLUMN];

Screenshot

(I clicked “# of 4” repeatedly until it looped back around to “1 of 4” again so that I could see test data with Darweesh, but as you can see, Jitterbit isn’t happy, because it’s convinced that the contents of existingContactIdsByExtId is a bunch of blank text _(a “string”).

As before, I can add 2 quick $existingContactIdsByExtId = ''; and RunScript("<TAG>Scripts/Lazy-Populate existingContactIdsByExtId</TAG>"); lines of code beforehand that I won’t keep in production to make sure things work.

Screenshot


Operation Script

We need to add one more thing to our Salesforce Update Operation before it will work: we need to ensure that every time it runs, the first thing it does is execute the “Lazy-Populate existingContactIdsByExtId” script.

From your Salesforce Update Operation’s configuration screen, click the “Edit” button next to “Operation.”

Screenshot

Right-click on the node of that Operation representing your data source (mine is called “My Oracle Database” because I’m creative like that) and click “Insert Before This,” next clicking “Script.”

Screenshot

Right-click on the new leftmost node called “Script” and click “Select Existing Script.”

Choose “Lazy-Populate existingContactIdsByExtId” and click “OK.”

Drag the leftmost node around until things look pretty, then use the “Save” and “Deploy” icons in your upper-right corner of this operation’s editor tab to ensure that your work isn’t lost. It should look something like this:

Screenshot


Validation

Let’s try running our Operation. Find it at left under “Jitterbit Connect” -> “Salesforce” -> “Salesforce Updates” -> (whatever you named it), right-click on it, and click “Run Update.”

Screenshot

Hooray – it works!

Celebration GIF

When I run the following SOQL query:

SELECT Id, Name, My_External_Id__c, Phone, Email
FROM Contact

My data looks like this:

Id Name My_External_Id__c Phone Email
003010101010101 Anush Amjit ERP28190    
003020202020202 Benita Borges      
003030303030303 Cathy Combs ERP71892 333-333-3333  
003040404040404 Darweesh Daher ERP10006 444-444-4444  
003050505050505 Ezra Ellis      
003060606060606 Frances Fulvia      


Jitterbit: Inserting Into Salesforce

Insert Operation

Time to try inserting.

I used Jitterbit to create an ordinary Salesforce Insert operation that queries my Oracle database as a source.

Again, I’ll trust that you know how to do this in Jitterbit (or have a plan to figure it out).

I named my operation “Insert Contacts By External ID.”

Transformation

Mapping: Name, Phone, External ID

Once you’ve set that up, in your Salesforce operation details screen, click the “Edit” button by “Mappings” to bring up the transformation involved in the operation (for me, it’s called “Insert Contacts By External ID Request“).

I dragged ID_COLUMN onto My_External_Id__c, FIRST_NAME onto FirstName, LAST_NAME onto LastName, and EMAIL_ADDRESS onto Email. No surprises there.

Condition (execute vs. skip record)

Now I have to do the same kind of “Add Condition” steps that I did above for my Update transformation, only the formula is going to be different.

In this case, we want:

<trans>
!HasKey($existingContactIdsByExtId, ID_COLUMN);
</trans>

The difference is the exclamation point before the HasKey – it serves as a “not” and flips the true/false results of HasKey.

Of course, we can test with the following script:

<trans>
$existingContactIdsByExtId = '';
RunScript("<TAG>Scripts/Lazy-Populate existingContactIdsByExtId</TAG>");
!HasKey($existingContactIdsByExtId, ID_COLUMN);
</trans>

Now we can see that Darweesh returns 0 (false) and Helen returns 1 (true). That’s perfect: we want to insert Helen but not Darweesh.

We’ll save and deploy our transformation (making sure we took any test lines of code out of our “Condition” formula).

(There’s no ID mapping)

Note that we don’t have to play any games with the Salesforce Id field on an Insert the way we did on an Update operation. We’ll just leave it un-mapped.

In fact, because you don’t need the “value” side of the dictionary at all for insert operations. If you know you’ll never need an Update operation, your initial script can just query this SOQL:

SELECT My_External_Id__c
FROM Contact
WHERE My_External_Id__c <> null

Inside the loop, you can assign null() to the “value” side of every dictionary entry, effectively using the dictionary as a set (which Jitterbit’s scripting language doesn’t have a notion of).

$existingContactIdsByExtId[arr[i][0]] = null();

Operation Script

As with Update, need to add one more thing to our Salesforce Insert Operation before it will work: we need to ensure that every time it runs, the first thing it does is execute the “Lazy-Populate existingContactIdsByExtId” script.

From your Salesforce Insert Operation’s configuration screen, click the “Edit” button next to “Operation.”

Right-click on the node of that Operation representing your data source and click “Insert Before This,” next clicking “Script.”

Right-click on the new leftmost node called “Script” and click “Select Existing Script.”

Choose “Lazy-Populate existingContactIdsByExtId” and click “OK.”

Drag the leftmost node around until things look pretty, then use the “Save” and “Deploy” icons in your upper-right corner of this operation’s editor tab to ensure that your work isn’t lost. It should look something like this:

Screenshot


Validation

Let’s try running our Operation. Find it at left under “Jitterbit Connect” -> “Salesforce” -> “Salesforce Inserts” -> (whatever you named it), right-click on it, and click “Run Insert.”

Screenshot

It works again. Yay.

Celebration GIF

When I run the following SOQL query:

SELECT Id, Name, My_External_Id__c, Phone, Email
FROM Contact

My data looks like this:

Id Name My_External_Id__c Phone Email
003010101010101 Anush Amjit ERP28190    
003020202020202 Benita Borges      
003030303030303 Cathy Combs ERP71892 333-333-3333  
003040404040404 Darweesh Daher ERP10006 444-444-4444  
003050505050505 Ezra Ellis      
003060606060606 Frances Fulvia      
003050505050505 Grace Gao ERP33014   [email protected]
003060606060606 Hopper Helen ERP17685   [email protected]

Screenshot


Fix A Typo

Oh wait – there’s a typo.

Dejected GIF

My database had Helen’s first and last name flipped.

Screenshot of previous data with Helen highlighted

(Did you all notice this whole time? Am I the only person who didn’t notice in any of my previous screenshots about her? 😳)

Can I Just Update?

All right – I just fixed her data in my source database.

In the real world, I’d probably have my UPDATE on a daily schedule, so Grace’s and Helen’s phone numbers are likely to end up in Salesforce by tomorrow, anyway.

How about I just manually run another update now?

Screenshot

…*Drumroll*…

Id Name My_External_Id__c Phone Email
003010101010101 Anush Amjit ERP28190    
003020202020202 Benita Borges      
003030303030303 Cathy Combs ERP71892 333-333-3333  
003040404040404 Darweesh Daher ERP10006 444-444-4444  
003050505050505 Ezra Ellis      
003060606060606 Frances Fulvia      
003050505050505 Grace Gao ERP33014 777-777-7777 [email protected]
003060606060606 Hopper Helen ERP17685 888-888-8888 [email protected]

Screenshot

*Womp Womp*

Grace and Helen now have phone numbers, but Helen’s name isn’t fixed … because my business rules I implemented said not to change anything but “phone number” in Salesforce upon update.

2 Options To Actually Fix It

At this point, I can either:

  1. Fix Helen manually in Salesforce, or
  2. Delete her from Salesforce and run the INSERT again.

I’m going to go with the latter for funsies.

Screenshot

Screenshot

…*Drumroll*…

Id Name My_External_Id__c Phone Email
003010101010101 Anush Amjit ERP28190    
003020202020202 Benita Borges      
003030303030303 Cathy Combs ERP71892 333-333-3333  
003040404040404 Darweesh Daher ERP10006 444-444-4444  
003050505050505 Ezra Ellis      
003060606060606 Frances Fulvia      
003050505050505 Grace Gao ERP33014 777-777-7777 [email protected]
003060606060606 Helen Hopper ERP17685   [email protected]

Screenshot

That’s better.

Lesson Learned?

  • Q: Is there a moral to this mistake?
  • A: Maybe …
    • Don’t double your work and break out a business process into “insert” and “update” operations where “upsert” will do?
    • (That is, if your field-by-field mappings are the same for each process.)

I really painted myself into a corner by deciding I needed a special phone-only update process as a “business rule.”


Consolidation Pro Tip

I recommend that you kick off every individual operation with an execution of the “Lazy-Populate existingContactIdsByExtId” Jitterbit script so that it’s easy to test or run the components individually.

Then, if you’re planning to run them back-to-back in actual scheduling, make a new Script-only operation called “Run Me” with a script called “Execution Controller” that has, say, the following contents:

<trans>
RunScript("<TAG>Scripts/Lazy-Populate existingContactIdsByExtId</TAG>");
RunOperation("<TAG>Operations/Jitterbit Connect™/Salesforce Inserts/Insert Contacts By External ID</TAG>");
RunOperation("<TAG>Operations/Jitterbit Connect™/Salesforce Updates/Update Contacts By External ID</TAG>");
</trans>

Technically, when “Run Me” runs, “Lazy-Populate existingContactIdsByExtId” will run 3 times (once because of “Execution Controller” and twice more because of the sub-operations), but because it’s “lazy,” it won’t actually do much the 2nd and 3rd times it runs.

I think it’s worthwhile to keep the “lazy” script as part of each sub-operation so that each sub-operation remains independently testable.

After all, it’s not like I’m running the script for every row that gets data-loaded (ahem … you did take that “test code” out of your row-by-row transformation logic, right?).


Further Considerations

I hope this post has helped give you an architecture to handle business cases when you do need to have special “insert by external ID” vs. “update by external ID” business processes.

One thing that’s really fun about this architecture is that Jitterbit couldn’t care less if you’re using a proper Salesforce external ID as the “matching” field.

I’ve done this with, say, a concatenation of FirstName + "|"+ LastName + "|" + Email as the “key” to my dictionary (keep in mind that if multiple Salesforce Contacts share a name+email, you’d only have one result stored in your “Jitterbit cache” dictionary variable, so think through how you want to handle that, business-logic-wise).

Note that I don’t typically load data directly on “soft” cache-matches like that. Instead, I’m looking to surgically fill in My_External_Id__c in Salesforce before doing a normal “upsert” against My_External_Id__c.

I’ll save that architecture for another post … so stay tuned for more Jitterbit tips!

--- ---