Pages

Friday, April 7, 2017

Steps required to support POSTing multipart/form-data Content-Type from Apex

I came across an interesting challenge when transferring example images to the Einstein Predictive Vision Service (PVS). As part of PVS you can upload example images to train an image classifier. As a gross generalization - here is a picture of a cat. If you get another image like that tell me it is probably a cat.

What follows are the lengths I had to go to in order to be able to call this web service method from Apex.

In their documentation MetaMind were kind enough to provide an example cURL command to submit the example image.

curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" 
 -H "Content-Type: multipart/form-data"
 -F "name=77880132.jpg" -F "labelId=614" -F "data=@C:\Mountains vs Beach\Beaches\77880132.jpg" 
 https://api.metamind.io/v1/vision/datasets/57/examples

It always struck me as odd that specifying the API usage via a specific command line tool has somehow become the de facto standard. It's like describing how to use a REST API is so difficult that it is easiest to just rely on the conventions of a known command line tool. Nothing against cURL, but it is hiding some of the mechanics of how that API call is working. In particular with this example is how the -F is composing the multipart/form-data and how the @ in the command will be substituted with the actual file from that path.

Back to uploading example images to PVS. Performing this API call from Apex proves to be a bit of a challenge as there isn't currently direct native support for a multipart form callouts from Apex. Please pause your blog reading at this point and consider voting for the idea - Image upload using multipart/form-data.

My first naive attempt at replicating the cURL call from Apex using the supplied HttpFormBuilder class failed miserably. The binary blob data that I was extracting from an image stored in a ContentVersion record wasn't being accepted by MetaMind.

    public ExampleResponse addExample(string access_token, string datasetId, string labelId, string filename,
        blob exampleImageData) {
        
        string postUrl = CREATE + '/' + datasetId + '/examples';
        
        string contentType = HttpFormBuilder.GetContentType();
        System.assertEquals('multipart/form-data', contentType);
        
        //  Compose the form
        string form64 = '';

        form64 += HttpFormBuilder.WriteBoundary();
        form64 += HttpFormBuilder.WriteBodyParameter('labelId', EncodingUtil.urlEncode(labelId, 'UTF-8'));
        form64 += HttpFormBuilder.WriteBoundary();
        form64 += HttpFormBuilder.WriteBodyParameter('name', EncodingUtil.urlEncode(filename, 'UTF-8'));
        form64 += HttpFormBuilder.WriteBoundary();
        form64 += HttpFormBuilder.WriteBodyParameter('data', EncodingUtil.base64Encode(exampleImageData));
        form64 += HttpFormBuilder.WriteBoundary(HttpFormBuilder.EndingType.CrLf);
        
        blob formBlob = EncodingUtil.base64Decode(form64);
        string contentLength = string.valueOf(formBlob.size());
        //  Compose the http request
        HttpRequest httpRequest = new HttpRequest();

        httpRequest.setBodyAsBlob(formBlob);
        httpRequest.setHeader('Connection', 'keep-alive');
        httpRequest.setHeader('Content-Length', contentLength);
        httpRequest.setHeader('Content-Type', contentType);
        httpRequest.setMethod('POST');
        httpRequest.setTimeout(120000);
        httpRequest.setHeader('Authorization','Bearer ' + access_token);
        httpRequest.setEndpoint(postUrl);

        //...
    }

Where did I go wrong? Well, unlike a prediction there isn't a variant of this call API that accepts the Base64 encoded image. Using HttpFormBuilder.WriteBodyParameter with the Base64 encoded blob to write the data wasn't going to work. MetaMind want the actual bytes and associated Content-Type header. Here is how the working request appears in Postman:

POST /v1/vision/datasets/1001419/examples HTTP/1.1
Host: api.metamind.io
Authorization: Bearer 12346539689dae622cbd91d9d4880b3314bfb747
Cache-Control: no-cache

----WebKitFormBoundaryE19zNvXGzXaLvS5C
Content-Disposition: form-data; name="data"; filename="ItsAFrog.png"
Content-Type: image/png

<bytes of file go here>
----WebKitFormBoundaryE19zNvXGzXaLvS5C
Content-Disposition: form-data; name="labelId"

8588
----WebKitFormBoundaryE19zNvXGzXaLvS5C
Content-Disposition: form-data; name="name"

itafrog.png
----WebKitFormBoundaryE19zNvXGzXaLvS5C

An alternative method was needed to HttpFormBuilder.WriteBodyParameter. One that would set the correct Content-Disposition and Content-Type headers for the binary data and then also correctly append the bytes from the image. That last point is really important. Correcting the headers wasn't very difficult, but with the binary file data I found the last few bytes were getting corrupted on the file.

Thankfully Enrico Murru had already covered a lot of the details around how to get proper multipart form POSTs working in Apex, so full credit to him for the blog post POST Mutipart/form-data with HttpRequest. This was then later refined by Grant Wickman in an answer to Post multipart without Base64 Encoding the body. The more I researched why HttpFormBuilder wasn't working the more I found MetaMind had based HttpFormBuilder partially on Enrico and Grant's work but hadn't taking it through to completion. For instance, the EndingType enum has the following comment indicating they were clearly thinking about it, just never applied it.

Helper enum indicating how a file's base64 padding was replaced.

To get it working I needed to finish applying Enrico and Grant's solutions.

It's all about the Base(64 encoding)

The smallest unit of storage on a file is a (octet) byte, which is composed of 8 bits. In contrast, Base64 encoding represents 6-bits with each character. So 4 Base64 characters can represent 3 bytes of input data. That ratio of 4:3 is very important with Base64 encoding. If the number of input bytes to be encoded is divisible by 3 then everything is fine as all 4 Base64 characters will represent meaningful input.

The problem occurs if the input byte length isn't divisible by 3. In that case the Base64 encoding process would normally append a padding symbol (=) or two to the end of the encoding to indicate which were the valid bytes in the last 3 and which were padding to get to the correct grouping of 3 bytes to 4 characters.

The following table shows how the padding needs to be applied if there are only two or one bytes to be encoded. You can see how some of the bits in the 6-Bit groupings no longer represent actual data when encoding only one or two input bytes.

Data Bytes In Binary Form 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1
Data Rearranged Into 6-Bit Groups 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1
6-Bit Groups in Decimal Form     5 3           2         3 1         5 5    
Groups Converted to ASCII Characters       1           C           f           3    
Data Bytes In Binary Form 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1                
Data Rearranged Into 6-Bit Groups 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0            
6-Bit Groups in Decimal Form     5 3           2         2 8                
Groups Converted to ASCII Characters       1           C           c           =    
Data Bytes In Binary Form 1 1 0 1 0 1 0 0                                
Data Rearranged Into 6-Bit Groups 1 1 0 1 0 1 0 0 0 0 0 0                        
6-Bit Groups in Decimal Form     5 3           0                            
Groups Converted to ASCII Characters       1           A           =           =    

Enrico's method to build up the full multipart form submission from Apex is to first build each part individually using Base64 encoding and then concatenate those parts together. That final Base64 string is then converted back to a Blob to become the body of the HTTP POST request.

The challenge with this method is you can't have intermediate padding characters. The Apex Base64 decoding process is going to ignore those intermediate padding characters. This in turn causes an incorrect mapping between the 4 Base64 characters and the 3 bytes they are supposed to represent. Here is an example using some anonymous Apex. Note how the decoding the concatenated values gives the same result with or without the internal padding:

String encodedA = EncodingUtil.base64Encode(Blob.valueOf('A'));
String encodedB = EncodingUtil.base64Encode(Blob.valueOf('BC'));
System.debug(encodedA);
System.debug(encodedB);
System.debug(EncodingUtil.base64Decode(encodedA).toString());
System.debug(EncodingUtil.base64Decode(encodedB).toString());
System.debug(EncodingUtil.base64Decode(encodedA + encodedB).toString());
System.debug(EncodingUtil.base64Decode('QQ' + 'QkM=').toString());

Output

20:17:50.65 (69059707)|USER_DEBUG|[3]|DEBUG|QQ==
20:17:50.65 (69127131)|USER_DEBUG|[4]|DEBUG|QkM=
20:17:50.65 (69373165)|USER_DEBUG|[5]|DEBUG|A
20:17:50.65 (69500491)|USER_DEBUG|[6]|DEBUG|BC
20:17:50.65 (69636544)|USER_DEBUG|[7]|DEBUG|A$
20:17:50.65 (69759126)|USER_DEBUG|[8]|DEBUG|A$

For encoding text this problem is bypassed by appending additional whitespace characters to the base text until the Base64 encoded representation no longer requires padding. Without the padding it is then possible to concatenate the two base 64 encoded values and then later decode the complete string back to the Blob sans additional bytes. This however doesn't work with a binary file like an image. You can't just go on appending additional bytes on the end without corrupting the file.

Instead the solution presented by Grant Wickman is to borrow one or both of the CR (\r) or LF (\n) characters that separate the end of the file bytes from the multipart boundary.

By shuffling these legitimate characters onto the end of the files bytes the need for padding can be removed. The padding on the boundary can then be adjusted as required using the white-space technique where it won't affect the integrity of the data.

There is a certain elegance to it as a solution. Note how the first 4 bits of the CR align with the last 6 bits of the second Base64 character so they don't need to be changed. And then again, if only switching out one padding = for a LF the first two bits again align.

I find that table very satisfying. I think I might print it out and keep it at my desk like one of those inspirational posters.

Example of additional HttpFormBuilder method

What did we learn?

  1. That rogue = padding characters in the middle of a Base64 string will corrupt the data.
  2. If the Base64 encoding for the file ends with "==", crop those characters off and replace with 0K to represent a CRLF. Then don't prepend the following boundary footer with those characters.
  3. If the Base64 encoding for the file ends with "=", crop that character off and replace with N to represent a CR. Then only prepend the following boundary footer a LF.
  4. That @muenzpraeger has a more refined Apex project for working with the Einstein Predictive Vision Service. I've submitted my changes there.
  5. The O'Reilly Book covers featuring pictures of animals are done in a woodcut/hedcut style.
  6. Base64 encoding examples can be inspirational and look great at your desk.

Friday, March 24, 2017

The Mad Catter - Salesforce Predictive Vision Services

Disclaimer

No animals were harmed in the creation of this blog post or the associated presentation. The standard disclaimer applies.

I've got a problem. A cat problem to be precise. While I'm more of a dog person, I don't particularly mind cats. However, recently there has been a bit of an influx of them with the neighbors. There are at least a half dozen of them that roam freely around the neighborhood. Not the end of the world, but they have a nasty habit of leaving presents on the lawn for the kids to find. Things like the following:

In short, they poop everywhere and leave the remains of birds lying around. Things that aren't so great to step on or for the kids to find when playing in the garden.

I spoke with the immediate neighbor who owns two of the cats, and he suggested spraying them with water to deter them away. While that did indeed prove to be a very effective, amusing, and satisfying approach to move them on it required 24 hour vigilance as they just kept coming back.

Get a Dog

My wife keeps saying we should just get a dog. I guess getting a dog to chase the cats away is an option. But it seems like training a dog to use the hose might be more trouble in the long run.

Technology to the Rescue

Thankfully I found a handy device the attaches to the end of the house and activates a sprinkler head when a built in motion detector is set off.

Perfect! Great! Cat comes into range, then a sudden noise and spray of water sends them off towards someone else's lawn to do their cat business. Problem solved and I can go back to doing more fun activities.

Except there was one small problem. The PIR motion sensor didn't particularly care what was moving in front of it. Cats, birds, the kids on their way to school, a courier with a parcel for me, a tree in the wind, the mother in law. It would spray them all regardless of whether I wanted it to or not.

Salesforce Predictive Vision Service

Technology wasn't solving my problem. I needed to use more of it!

I recalled a recent presentation by the Salesforce Developers team - Build Smarter Apps with New Predictive Vision Service. The short version of that presentation is you can train a deep learning image classifier with a set of training images. Then when you give it a new image it will give you probabilities about what is likely in the picture. I've create a quick start unmanaged package for it to save you going through most of the install steps.

To make this work I needed a large collection of cat images to train the dataset from a create a model. Luckily for me, providing pictures of cats is something that the internet excels at.

The second challenge with using the predictive vision services is managing how many images I am going to send through to the service. If I just point a web camera out the window it could be capturing 30+ frames per second. Not really practical to send off each frame to the service for identification when there might be nothing of interest happening 99% of the time.

Motion detection

I had a few options here.

Option One would to to stick with the basic PIR motion sensor, but it would still be generating a ton of false positives that would need to pass through the image recognition. A simple cool down timer would help, but the image captured immediately after the first motion is detected would likely get something as it is just entering the frame.

I figure since I'm going to need a camera to capture the initial image I might as well get some usage out of it in detecting the motion. Because of the initial processing step I can exclude motion from certain areas, such as a driveway or tree that often moves in the wind. There can also be a slight delay after the motion is detected and before the prediction image is captured. This gives the subject time to move into the image.

The prototype solution looks like this:

  1. A webcam that can be pointed at the area to be monitored.
  2. Motion Detection software to process the video feed and determine where the movement is and the magnitude. The magnitude is useful, as particularly small subjects like birds can be completely ignored.
  3. The ability for that software to pass frames of interest off to the Salesforce Predictive Vision Service. This is a simple REST POST request using an access token.
  4. If the probability from the frame indicates a Cat is present, send a signal to the Raspberry Pi.
  5. On the signal, the Raspberry Pi activates the GPIO pin connected to a relay.
  6. When activated, the relay supplies power to the existing automated sprinkler, which activates on initial power on when the sensitivity is set to maximum. Another option here is directly connecting a solenoid water value to the hose line.

When all put together the end result looks something like this:

The Einstein bust with terminator-esque glowing red eyes was part of the presentation I gave on this topic.

While filming that video I inadvertently live tested it on myself as well. An aging fitting on the hose connector to the sprinkler had come loose outside at the tap. So I went out to fix that, restored the water pressure to the sprinkler, then walked back to the laptop to resume the test. Only when I checked the motion detection screen did I realize it had captured my image passing in front of the sprinkler. Thankfully the predictive vision services came back indicating I didn't resemble a cat and the sprinkler didn't activate. Success!

Refinements

It occured to me that there were further improvements that could be made.

The first and easiest change I made is to activate on things other than cats. It can be equally selective to activate on a wandering neighbors dog, squirrels, general wildlife, etc...

I needed a way to deal with unfortunate false positives, such as a person wearing something with a picture of a cat on it. These can partially be avoided by looking at all the probabilities that Einstein is returning and having thresholds against each label. I.e. Activate on any Cat prediction above 50% unless there is any prediction indicating a person in the field of view. Images kept from the activations could also be used to further refine the examples in the dataset.

These first two refinements are actually present in the video above. When using the general image classifier it typically identifies the stuffed cat as a teddy bear. So in the small section to the bottom right of the app you can mark labels to activate on and labels that will override and prevent the activation.

Other changes I might consider making:

The motion sensor could be maintained and introduced as the step prior to activating the video feed. This would increase the time between the target entering the area and the sprinkler activating, but would save a lot of endless processing loops looking at an unchanging image.

If I forgo some of the more processing intensive motion tracking the whole solution could be moved onto the Raspberry Pi. This would make it a much more economical solution.

However, another option with the motion detection still in place would be to crop the frame image to just the area where the motion was detected. This should lead to much higher prediction accuracy as it is only focusing on the moving subject.

When real world testing commences with live subjects I'll need to add a video capture option to cover the time from just before the sprinkler is activated till just after it switches off. I think the results will be worth the extra effort.

I have a range of other devices that could easily be activated via the relay attached to the Raspberry Pi. One such device is an ultrasonic pest repeller. Perhaps combined with a temperature sensor as a slightly kinder deterrent on cold nights.

User Group Presentation

I gave a talk to the Sydney developer user group on this project. The slides, as they were:


I still feel the need to settle on a name for the project. Options include:

  • The Mad Catter (after the elusive Catter Trailhead badge)
  • The Einstein Cannon
  • The Cattinator (After the general them of the presentation.)

See also:

Gallery

Thursday, March 2, 2017

Salesforce SOAP Callout debugging trickery

Here's a handy practice when making SOAP callouts from Salesforce and handling errors.

When a Callout goes pear-shaped and you get an exception, keep track of the request parameters by doing a JSON serialize and keeping the result in a custom object.

Then in the dev packaging org you can rehydrate the same request by deserializing the JSON from the custom object and making the same callout. Because you are now in a dev org you can see the raw SOAP message in the CALLOUT_REQUEST logging.

string jsonData = [Select ReferenceData__c from Error_Details__c where ID = 'a084000000w6ReO'].ReferenceData__c;

SoapWebService.Order order = (SoapWebService.Order)JSON.deserialize(jsonData, SoapWebService.Order.class);

SoapWebService.ServiceCredential credential = new SoapWebService.ServiceCredential();

SoapWebService.BasicHttpBinding_IConnectNS service = new SoapWebService.BasicHttpBinding_IConnectNS();
service.UpdateOrder(credential, order);

From there you can take the raw SOAP request over to something like SOAP UI to debug it further.

Friday, February 10, 2017

Visualforce Quick Start with the Salesforce Predictive Vision Services

Salesforce recently released the Salesforce Predictive Vision Services Pilot. You can watch the corresponding webinar.

I went through the Apex Quick Start steps and thought I could simplify them a bit. At the end of the process you should have the basic Apex and a visualforce page to test image predictions against.

Steps

  1. Sign up for a predictive services account using a Developer Org. The instructions here are fairly straight forward.
    Go to https://metamind.io/ and use the Free Sign Up link. OAuth to your dev org. Download the resulting predictive_services.pem file that contains your private key and make note of the "you've signed up with" email address. You will need the file later and the email address if your org users email address differs.
    • Note: the signup is associated with the Users Email address, not the username. So you might get conflicts between multiple dev orgs sharing the same email address.
  2. Upload your predictive_services.pem private key to the same developer org into Files and title it 'predictive_services'. This title is used to get the details of the private key by the Apex Code.
  3. Install the unmanaged package that I created (Requires Spring '17).
    I've pulled the required parts together from https://github.com/salesforceidentity/jwt and https://github.com/MetaMind/apex-utils. I've also made some modifications to the Visualforce page and corresponding controller to give more flexibility defining the image URL.
  4. Browse to the PredictService Visualforce Tab.
  5. Press the [Vision From Url] button.
  6. Examine the predictions against the General Image Model Class List.
  7. Change the Image URL to something publicly accessible and repeat the previous couple of steps as required.

Thursday, February 2, 2017

FuseIT SFDC Explorer 3.5.17023.3 - The more logging edition

The latest v3.5 release of the FuseIT SFDC Explorer is out and contains a couple of new features around Apex Debug logs.

The Challenge and Premise

Understanding an Apex log can require understanding events occurring at vastly different time scales.

At the very fine end, each event timestamp is supplemented with the elapsed time in nanoseconds since the start of the request. At the other end is the duration of the entire log itself, which can span seconds or even minutes of execution time.

By my figuring that is 3 orders of magnitude difference.

To try and put that into perspective...

DurationExample (using something very fast)
1 nanosecondLight travels 30 centimeters (12 inches)
1 minuteLight travels 17,990,000 kilometers (11,180,000 miles)

So while in a nanosecond reflected light could travel from your hand to your eyes. In a minute it could travel between the earth and moon 46 times. Or around the circumference of the earth almost 450 times. Yes, I'm playing a bit fast and loose using the speed of light in a vacuum, but you get the general gist of how vastly different a nanosecond duration is to seconds or minutes. It takes one billion nanoseconds to make a second, and that is a very big number when you are dealing with log events.

That's enough of a detour trying to make the point that we are dealing with periods of time at vastly different scales. I'll now take a similarly cavalier approach to how I'm going to address this challenge.

The human brain processes visual data 60,000 times faster than text

That's an awesome quote for what I'm trying to demonstrate, except it doesn't seem to be backed up by any actual research. Let's roll with it anyway.

When looking at a log it is useful to see how an events timing relates to those events immediately around it and where it sits in the overall transaction. To that end, I've started plotting the debug log events out in a timeline view under the core log.

"But Daniel" you say, "The Developer Console Log Inspector has had the Execution Overview for yonks, why do we need another log viewer?"
To which I reply, "Are you British? Because yonks sounds like a unit of time you would hear about when watching something from the BBC." and "How on earth did you include a hyperlink in speech? That's some next level DOM injection right there."

My primary reason for making a log parser has always been that the Developer Console is of no use to you if you can't load the log of interest into it. Logs don't just come from directly in the console. They get emailed to you in the "Developer script exception" emails, or from a well meaning admin. They get saved to disk and then examined days after the fact. In cases like these the Developer Console can't help you at all.

While the FuseIT SFDC Explorer will happily load logs captured directly in the org, it can also have them pasted straight in and parse them all the same.

Debug log Timeline view

I've deliberately tried to avoid making a carbon copy of the existing Developer Console functionality. What would be the point? Instead I've looked for a way to visualize all the events in one timeline view. Of course, with some things occurring so closely together the finer details get lost. Where I've found it useful is:

  • in identifying clumps of events,
  • where an event sits in relation to the rest of the log, and
  • to jump to events of importance quickly.

Let's look at the timeline the came out of a test class run. The log had reached the 2MB limit and covered 13,000 events over 39,500 lines. One of the test methods failed, and we want to hone in on that in the log.

Note the bang icon in the middle of the timeline. Clicking on that takes use straight to the FATAL_ERROR in question.

Debug log Tree view

The Developer Console provides both the Stack Tree and Execution Stack for the currently selected event. I've always found these a little odd to be honest in the slight disconnect with the actual log events. E.g. USER_DEBUG becomes "debug".

Let's start with something simple. Execute anonymous for a for loop that does 8 iterations of a debug statement.

for(integer i = 0; i < 8; i++) {
    System.debug(i);
}

The Developer Console shows the 8 debug statements. All with a duration of 0.01 ms with the exception of one that took 0.06 ms. The Execution Stack shows similar details for the currently selected event.

What can we see from the same code in the FuseIT SFDC Explorer? That depends on how you filter the log.

If you keep the default settings with opening the log with [Prefilter Log] enabled that various events like SYSTEM_METHOD_ENTRY and SYSTEM_METHOD_EXIT will be completely omitted. This makes the log easier to work with, but mucks with the event durations. With logs you can easily tell when something happened, but to get an accurate duration you need a BEGIN/END or ENTRY/EXIT pair of events. Hence the duration of the first USER_DEBUG seems excessively long as it was measured from the prior event.

If you keep all the log events in then you get a tree with very similar figures. The main difference being that you and see the ENTRY/EXIT pairs.

Real World example

Have a look at the Apex CPU time limit exceeded in tidy trigger pattern question on the Salesforce StackExchange without skipping down to the answer (NO CHEATING). Grab the apex log they attached and try and figure out what the likely cause of the CPU limit exception is.


Read on when you've figured it out...


Here's what I can tell you from the log timeline.

Notice the recurring pattern of red (before update triggers), green (validation), orange (after update triggers), and purple (workflow). As per the question they are updating 2956 Account records. So the records are processed in batches of 200. You can also see where the skipped log section is (exclamation mark about 3/4 away along) and the FATAL_ERRORs at the end of the log.

If you then look at one of those batches in the tree view I can see that the triggers themselves are relatively quick and the longest duration from any of the code units is for the workflow. Definitely the smoking gun to investigate first.

I like to think that the combination of the timeline and treeview made isolating the problem much easier. Especially considering the Developer Console wasn't available in this case.

The forward looking statements

It's still very much a work in progress.

The biggest thing that stands out to me at the moment is the color coding for events. I want similar events to have similar colors. Important events to stand out and less important events to fade away. The CODE_UNIT color categories not to conflict with the event colors. This is a tricky thing to do when you struggle to name more than the standard 16 colors supported with the Windows VGA palette

The accuracy of the duration measurements is important. In the current 3.5 release the elapsed times were all converted to C# Timespans, which lacked the nanosecond accuracy. In the next release I'll do all the calculations from the raw nanoseconds and convert to Timespans only when needed for display.

Friday, January 20, 2017

Choose Your Own Adventure - Dirty Dozen showdown with the REST API vs SOAP API vs BULK API

You're an external system to Salesforce. Stuff happened and now there are a dozen dirty records that need to be updated in Salesforce to reflect the changes. An active Salesforce Session ID (a.k.a access token) that can be used to make API calls with is available. All the records have the corresponding Salesforce Ids, so a direct update can be performed. Ignore for the moment that the records might also be deleted and in the recycle bin or fully deleted (really truly gone).

To further complicate matters, there is a quagmire of triggers, workflow, and validation on the objects in Salesforce. This is a subscriber org for a managed package, so you can't just fix those.

Which API do you use to update those records in Salesforce?
Pick a path:

  1. You use REST API PATCH requests to update records. Turn to page 666
  2. You use the REST API composite batch resource to update records. Turn to page 78
  3. You use the REST API composite tree resource to update the records. Turn to page √–1
  4. You use the SOAP API update() call. Turn to page 42
  5. You use the Bulk API to update them. Turn to page page 299792458
  6. You hand craft an Apex REST web service to do the processing. Turn to page 0

REST API PATCH requests

There are 12 records and the API will only allow you to PATCH one at a time. So that's 12 API calls.

You die a slow and painful death. GAME OVER

Try Again?

Postmortem:

Each request round trips to Salesforce, processes all the triggers,workflow, validation on each individual record, and returns the result. Individually each request is only a couple of seconds, but collectively they take way too long for the waiting user.

Request

POST /services/data/v38.0/sobjects/OpportunityLineItem/00k7000000eaaZBAAY HTTP/1.1
Host: na5.salesforce.com
Authorization: Bearer 00D700000000001!AQ0AQOzUlrjD_NotARealSession_x61fsbSS6GGWJ123456789mKjmhS0myiYYK_sW_zba
Content-Type: application/json

Request Body

{"End_Date__c": "2017-01-19"}

204 Response: Time (2,018 to 2,758 ms) multiplied by twelve records gives 24,216 to 33,096 ms

REST API Composite batch

You learnt your lesson with the individual REST API calls (or maybe you came straight here), so switch to a single composite batch call. This will give you one round trip to the server.

You die a (slightly less, but still very much) slow and painful death. GAME OVER

Try Again?

Postmortem:

You're down to one API request, which is good. But less than desirable things are happening in Salesforce. Each sub request in the batch is splitting into a separate transaction.

There is still a big penalty to pay for running the accumulation of triggers and other gunk one record at a time. The trigger bulkification can't help you is they are all separate transactions.

Also, don't forget that you can only do 25 records per batch. Not such a problem with 12 records, but it has limited scaling potential.

Request

POST /services/data/v38.0/composite/batch HTTP/1.1
Host: na5.salesforce.com
Authorization: Bearer 00D700000000001!AQ0AQOzUlrjD_StillNotARealSession_x61fsbSS6GGWJ123456789mKjmhS0myiYYK
Content-Type: application/json

Request Body

{
 "batchRequests": [{
   "method": "PATCH",
   "url": "v38.0/sobjects/OpportunityLineItem/00k7000000eaaZBAAY",
   "richInput": {
    "End_Date__c": "2017-01-19"
   }
  }, {
   "method": "PATCH",
   "url": "v38.0/sobjects/OpportunityLineItem/00k7000000eaaZCAAY",
   "richInput": {
    "End_Date__c": "2017-01-19"
   }
  }, {
   "method": "PATCH",
   "url": "v38.0/sobjects/OpportunityLineItem/00k7000000eaaZDAAY",
   "richInput": {
    "End_Date__c": "2017-01-19"
   }
  }, {
   "method": "PATCH",
   "url": "v38.0/sobjects/OpportunityLineItem/00k7000000eaaZEAAY",
   "richInput": {
    "End_Date__c": "2017-01-19"
   }
  }, {
   "method": "PATCH",
   "url": "v38.0/sobjects/OpportunityLineItem/00k7000000eaaZFAAY",
   "richInput": {
    "End_Date__c": "2017-01-19"
   }
  },
                //...
  
 ]
}

Response: Time (20,053 ms)

{
    "hasErrors": false,
    "results": [
        {
            "statusCode": 204,
            "result": null
        },
        {
            "statusCode": 204,
            "result": null
        },
        {
            "statusCode": 204,
            "result": null
        },
        {
            "statusCode": 204,
            "result": null
        },
        {
            "statusCode": 204,
            "result": null
        },
        //...
    ]
}

Bonus

Look at the log duration for each sub request. They appear to be the accumulation of time for the entire API request rather than each individual sub transaction. It certainly confused me for a bit.

REST API Composite tree

Currently (as at Spring '17) it can work with up to 200 records, which is a good start. However, the composite tree resource is only for creating records, not updating them.

You die of embarrassment from trying to use an incompatible API. GAME OVER

Try Again?

Postmortem:

Always check the documentation first.

SOAP API update call

SOAP, are you sure? That API's been rattling around since 2004 in API v5.0.

Success, the records are all updated in a reasonable timeframe.

Try something else?

Review:

One POST request, and 4262 ms later you have a response. Processing time does increase with each record added, but nowhere near the overhead of the previous REST API's.

POST Request to https://na5.salesforce.com/services/Soap/u/38.0

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:urn="urn:partner.soap.sforce.com" xmlns:urn1="urn:sobject.partner.soap.sforce.com">
   <soapenv:Header>
      <urn:SessionHeader>
         <urn:sessionId>00D700000000001!AQ0AQOzUlrjD_SessionIdCleanedWithSoap_x61fsbSS6GGWJ123456789mKjmhS0myiYYK_sW_zba</urn:sessionId>
      </urn:SessionHeader>
   </soapenv:Header>
   <soapenv:Body>
      <urn:update>
         <urn:sObjects>
            <urn1:type>OpportunityLineItem</urn1:type>
            <urn1:fieldsToNull></urn1:fieldsToNull>
            <urn1:Id>00k7000000eaaZBAAY</urn1:Id>
            <urn1:End_Date__c>2017-01-19</urn1:End_Date__c>
         </urn:sObjects>
         <urn:sObjects>
            <urn1:type>OpportunityLineItem</urn1:type>
            <urn1:fieldsToNull></urn1:fieldsToNull>
            <urn1:Id>00k7000000eaaZCAAY</urn1:Id>
            <urn1:End_Date__c>2017-01-19</urn1:End_Date__c>
         </urn:sObjects>
         <urn:sObjects>
            <urn1:type>OpportunityLineItem</urn1:type>
            <urn1:fieldsToNull></urn1:fieldsToNull>
            <urn1:Id>00k7000000eaaZDAAY</urn1:Id>
            <urn1:End_Date__c>2017-01-19</urn1:End_Date__c>
         </urn:sObjects>
         <urn:sObjects>
            <urn1:type>OpportunityLineItem</urn1:type>
            <urn1:fieldsToNull></urn1:fieldsToNull>
            <urn1:Id>00k7000000eaaZEAAY</urn1:Id>
            <urn1:End_Date__c>2017-01-19</urn1:End_Date__c>
         </urn:sObjects>
         <!-- ... -- >

      </urn:update>
   </soapenv:Body>
</soapenv:Envelope>

Response

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns="urn:partner.soap.sforce.com">
   <soapenv:Header>
      <LimitInfoHeader>
         <limitInfo>
            <current>465849</current>
            <limit>6700000</limit>
            <type>API REQUESTS</type>
         </limitInfo>
      </LimitInfoHeader>
   </soapenv:Header>
   <soapenv:Body>
      <updateResponse>
         <result>
            <id>00k7000000eaaZBAAY</id>
            <success>true</success>
         </result>
         <result>
            <id>00k7000000eaaZCAAY</id>
            <success>true</success>
         </result>
         <result>
            <id>00k7000000eaaZDAAY</id>
            <success>true</success>
         </result>
         <!-- ... -->
            
      </updateResponse>
   </soapenv:Body>
</soapenv:Envelope>

Bulk API

It's primarily billed as a way to asynchronously load large sets of data into Salesforce. Let's see how we go with only 12...

You have a harrowing brush with death by API ceremony. If the asynchronous gods favor you it is a timely update. Otherwise disgruntled users tear you limb from limb as they get fed up of waiting for the results to come back.

Try something else?

Results:

There are five API calls to be made to complete this operation on a good day. If things go bad then you might be waiting longer than expected. You need to keep polling the API for the job to complete before you can get the results back. You're also burning five API calls where you could be using one to complete the entire operation.

Create Job

Request

POST /services/async/38.0/job HTTP/1.1
Host: na5.salesforce.com
X-SFDC-Session: Bearer 00D700000000001!AQ0AQOzUlrjD_NothingToSeeHere_x61fsbSS6GGWJ123456789mKjmhS0my
Content-Type: application/xml

Request Body

<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
    <operation>update</operation>
    <object>OpportunityLineItem</object>
    <contentType>CSV</contentType>
</jobInfo>

Response Time (617 ms)

<?xml version="1.0" encoding="UTF-8"?>
<jobInfo
    xmlns="http://www.force.com/2009/06/asyncapi/dataload">
    <id>75070000003qVrHAAU</id>
    <operation>update</operation>
    <object>OpportunityLineItem</object>
    <createdById>00570000004uCVJAA2</createdById>
    <createdDate>2017-01-19T23:08:06.000Z</createdDate>
    <systemModstamp>2017-01-19T23:08:06.000Z</systemModstamp>
    <state>Open</state>
    <concurrencyMode>Parallel</concurrencyMode>
    <contentType>CSV</contentType>
    <numberBatchesQueued>0</numberBatchesQueued>
    <numberBatchesInProgress>0</numberBatchesInProgress>
    <numberBatchesCompleted>0</numberBatchesCompleted>
    <numberBatchesFailed>0</numberBatchesFailed>
    <numberBatchesTotal>0</numberBatchesTotal>
    <numberRecordsProcessed>0</numberRecordsProcessed>
    <numberRetries>0</numberRetries>
    <apiVersion>38.0</apiVersion>
    <numberRecordsFailed>0</numberRecordsFailed>
    <totalProcessingTime>0</totalProcessingTime>
    <apiActiveProcessingTime>0</apiActiveProcessingTime>
    <apexProcessingTime>0</apexProcessingTime>
</jobInfo>

Add a Batch to the Job

Request

POST /services/async/38.0/job/75070000003qVrHAAU/batch HTTP/1.1
Host: na5.salesforce.com
X-SFDC-Session: Bearer 00D700000000001!AQ0AQOzUlrjD_HereIsSomeWorkToDo_x61fsbSS6GGWJ123456mKjmhS0myiYYK_sW_zba
Content-Type: text/csv

Request Body

Id,End_Date__c
"00k7000000eaaZBAAY","2017-01-19"
"00k7000000eaaZCAAY","2017-01-19"
"00k7000000eaaZDAAY","2017-01-19"
"00k7000000eaaZEAAY","2017-01-19"
"00k7000000eaaZFAAY","2017-01-19"
"00k7000000eaaYDAAY","2017-01-19"
"00k7000000eaaZQAAY","2017-01-19"
"00k7000000eaaZpAAI","2017-01-19"
"00k7000000eaaa4AAA","2017-01-19"
"00k7000000eaaZkAAI","2017-01-19"
"00k7000000eaaZlAAI","2017-01-19"
"00k7000000eaaXKAAY","2017-01-19"

Response time: 964 ms

<?xml version="1.0" encoding="UTF-8"?>
<batchInfo
   
    xmlns="http://www.force.com/2009/06/asyncapi/dataload">
    <id>75170000005cAFMAA2</id>
    <jobId>75070000003qVrHAAU</jobId>
    <state>Queued</state>
    <createdDate>2017-01-19T23:15:21.000Z</createdDate>
    <systemModstamp>2017-01-19T23:15:21.000Z</systemModstamp>
    <numberRecordsProcessed>0</numberRecordsProcessed>
    <numberRecordsFailed>0</numberRecordsFailed>
    <totalProcessingTime>0</totalProcessingTime>
    <apiActiveProcessingTime>0</apiActiveProcessingTime>
    <apexProcessingTime>0</apexProcessingTime>
</batchInfo>

Close the Job

Request

POST /services/async/38.0/job/75070000003qVrHAAU HTTP/1.1
Host: na5.salesforce.com
X-SFDC-Session: Bearer 00D700000000001!AQ0AQOzUlrjD_AnotherApiCall_ReallyQ_x61fsbSS6GGWJ56789mKjmhS0myiYYK_sW_zba
Content-Type: application/xml; charset-UTF-8

Request Body

<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
  <state>Closed</state>
</jobInfo>

Response time: 1291 ms

<?xml version="1.0" encoding="UTF-8"?>
<jobInfo
   
    xmlns="http://www.force.com/2009/06/asyncapi/dataload">
    <id>75070000003qVrHAAU</id>
    <operation>update</operation>
    <object>OpportunityLineItem</object>
    <createdById>00570000004uCVJAA2</createdById>
    <createdDate>2017-01-19T23:08:06.000Z</createdDate>
    <systemModstamp>2017-01-19T23:08:06.000Z</systemModstamp>
    <state>Closed</state>
    <concurrencyMode>Parallel</concurrencyMode>
    <contentType>CSV</contentType>
    <numberBatchesQueued>0</numberBatchesQueued>
    <numberBatchesInProgress>0</numberBatchesInProgress>
    <numberBatchesCompleted>0</numberBatchesCompleted>
    <numberBatchesFailed>1</numberBatchesFailed>
    <numberBatchesTotal>1</numberBatchesTotal>
    <numberRecordsProcessed>0</numberRecordsProcessed>
    <numberRetries>0</numberRetries>
    <apiVersion>38.0</apiVersion>
    <numberRecordsFailed>0</numberRecordsFailed>
    <totalProcessingTime>0</totalProcessingTime>
    <apiActiveProcessingTime>0</apiActiveProcessingTime>
    <apexProcessingTime>0</apexProcessingTime>
</jobInfo>

Check the Batch Status

Request

GET /services/async/38.0/job/75070000003qVrHAAU/batch/75170000005cAFMAA2 HTTP/1.1
Host: na5.salesforce.com
X-SFDC-Session: Bearer 00D700000000001!AQ0AQOzUlrjD_LosingTheWillToLive_x61fsbSS6GGWJ126789mKjmhS0myiYYK_sW_zba

Response time: 242 ms

<?xml version="1.0" encoding="UTF-8"?>
<batchInfo
   
    xmlns="http://www.force.com/2009/06/asyncapi/dataload">
    <id>75170000005cAFMAA2</id>
    <jobId>75070000003qVrHAAU</jobId>
    <state>Completed</state>
    <createdDate>2017-01-19T23:27:54.000Z</createdDate>
    <systemModstamp>2017-01-19T23:27:56.000Z</systemModstamp>
    <numberRecordsProcessed>12</numberRecordsProcessed>
    <numberRecordsFailed>1</numberRecordsFailed>
    <totalProcessingTime>1889</totalProcessingTime>
    <apiActiveProcessingTime>1741</apiActiveProcessingTime>
    <apexProcessingTime>1555</apexProcessingTime>
</batchInfo>

Retrieve the Batch Results

Request

GET /services/async/38.0/job/75070000003qVrHAAU/batch/75170000005cAFMAA2/result HTTP/1.1
Host: na5.salesforce.com
X-SFDC-Session: Bearer 00D700000000001!AQ0AQOzUlrjD_AreWeThereYet_x61fsbSS6GGWJ123456789mKjmhS0myiYYK_sW_zba

Response time: 236 ms

"Id","Success","Created","Error"
"00k7000000eaaZBAAY","true","false",""
"00k7000000eaaZCAAY","true","false",""
"00k7000000eaaZDAAY","true","false",""
"00k7000000eaaZEAAY","true","false",""
"00k7000000eaaZFAAY","true","false",""
"00k7000000eaaYDAAY","true","false",""
"00k7000000eaaZQAAY","true","false",""
"00k7000000eaaZpAAI","true","false",""
"00k7000000eaaa4AAA","true","false",""
"00k7000000eaaZkAAI","true","false",""
"00k7000000eaaZlAAI","true","false",""
"00k7000000eaaXKAAY","true","false",""

Review:

With only a single call to check the batch status it came back at a respectable 3350 ms total for all the API calls. That doesn't include any of the overhead on the client side. There could be some variance here while waiting for they Async job to complete.

Apex REST Web Service

OK, I'll be honest, after all those BULK API calls I'm exhausted. Also, I can't just deploy an Apex Web Service to the production org where I was bench marking against.

Your fate is ambiguous because the narrator was to lazy to test it. Go to page 0.

Try something else? or Try again?

Review:

Performance is probably "pretty good"™ with only one API call and one transaction that can use the bulkification in the triggers. However, you'll need to define the interface, maintain the code, create tests and mocks.

Revised Results

I had some time to revisit this, create an Apex REST web service in the sandbox, and test it.

It takes a bit more effort to create the Apex class with the associated test methods and then deploy them to production. The end result is a timely response.

Revised Review:

In the ideal world the Apex REST web service would be streamlined to the operation being performed. I sort of cheated a bit and created it to have the same signature as the composite batch API. It also bypasses any sort of error checking or handling.

@RestResource(urlMapping='/compositebatch/*')
global class TestRestResource {

    @HttpPatch
    global static BatchRequestResult updateOlis() {
        
        RestRequest req = RestContext.request;
        BatchRequest input = (BatchRequest)JSON.deserialize(req.requestBody.toString(), BatchRequest.class);
        
        BatchRequestResult result = new BatchRequestResult();
        result.hasErrors = false;
        result.results = new List<BatchResult>();
        
        List<OpportunityLineItem> olisToUpdate = new List<OpportunityLineItem>();
        for(BatchRequests br : input.batchRequests) {
            olisToUpdate.add(br.richInput);
            Id oliId = br.url.substringAfterLast('/');
            br.richInput.Id = oliId;
            
            result.results.add(new BatchResult(204));
        }
        System.debug('Updating: ' + olisToUpdate.size() + ' records');
        
        // Should be using Database.update so any errors could be split out.
        update olisToUpdate;  
        
       return result;
    }
    
    global class BatchRequest {
        public List<BatchRequests> batchRequests;
    }
    
    global class BatchRequests {
        public String method;
        public String url;
        public OpportunityLineItem richInput;
    }
    
    global class BatchRequestResult {
        boolean hasErrors;
        List<BatchResult> results;
    }
    
    global class BatchResult {
        public integer statusCode;
        public string result;
        
        public BatchResult(integer status) {
            this.statusCode = status;
        }
    }
    
}

This can then use exactly the same request that the composite batch did.

Response: Time (3,362 ms) against a sandbox Org

To give a relative benchmark in the same sandbox, the SOAP API took 3,172 ms. That gives a time of around 4,500 ms in "production time".

Summary

Lets recap how long it took to update our dozen dirty records:

  • REST API PATCH requests — 24,216 to 33,096 ms
  • REST API Composite batch — 20,053 ms
  • REST API Composite tree — n/a for updates
  • SOAP API update call — 4262 ms
  • Bulk API — 3350 ms = 617 ms + 964 ms + 1291 ms + n*242 ms + 236 ms
  • Apex REST Web Service — 4,517 ms (extrapolated from sandbox)

I was expecting the SOAP API to fare better against the Bulk API with such a small set of records and one API call versus five. But they came out pretty comparable.

Certainly as the number of records increases the Bulk API should leave the SOAP API in the dust. Especially with the SOAP API needing to start batching ever 200 records.

The other flavors of the REST API are pretty awful when updating multiple records of the same type as they get processed in individual transactions. To be fair, that's not what they are intended for.

Your results will vary significantly as the subscriber org I was testing against had some pretty funky triggers going on. Those triggers were magnifying the impact of sub request transaction splitting by the composite batch processing. I wouldn't usually classify 4 second responses as "timely". It's all relative.

Also, I could have been more rigorous in how the timing measurements were made. E.g. trying multiple times, etc... It's pretty difficult to get consistent times when there are so many variables in a multi-tenanted environment. Repeated calls could easily create ± 500 ms variance between calls.

The idea did occur to me to Allow REST API composite batch subrequests to be processed in one transaction. That would overcome the gap in the REST API where a small number of related records could be updated in one API call.


See Also: