Friday, April 7, 2017

Steps required to support POSTing multipart/form-data Content-Type from Apex

I came across an interesting challenge when transferring example images to the Einstein Predictive Vision Service (PVS). As part of PVS you can upload example images to train an image classifier. As a gross generalization - here is a picture of a cat. If you get another image like that tell me it is probably a cat.

What follows are the lengths I had to go to in order to be able to call this web service method from Apex.

In their documentation MetaMind were kind enough to provide an example cURL command to submit the example image.

curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" 
 -H "Content-Type: multipart/form-data"
 -F "name=77880132.jpg" -F "labelId=614" -F "data=@C:\Mountains vs Beach\Beaches\77880132.jpg" 
 https://api.metamind.io/v1/vision/datasets/57/examples

It always struck me as odd that specifying the API usage via a specific command line tool has somehow become the de facto standard. It's like describing how to use a REST API is so difficult that it is easiest to just rely on the conventions of a known command line tool. Nothing against cURL, but it is hiding some of the mechanics of how that API call is working. In particular with this example is how the -F is composing the multipart/form-data and how the @ in the command will be substituted with the actual file from that path.

Back to uploading example images to PVS. Performing this API call from Apex proves to be a bit of a challenge as there isn't currently direct native support for a multipart form callouts from Apex. Please pause your blog reading at this point and consider voting for the idea - Image upload using multipart/form-data.

My first naive attempt at replicating the cURL call from Apex using the supplied HttpFormBuilder class failed miserably. The binary blob data that I was extracting from an image stored in a ContentVersion record wasn't being accepted by MetaMind.

    public ExampleResponse addExample(string access_token, string datasetId, string labelId, string filename,
        blob exampleImageData) {
        
        string postUrl = CREATE + '/' + datasetId + '/examples';
        
        string contentType = HttpFormBuilder.GetContentType();
        System.assertEquals('multipart/form-data', contentType);
        
        //  Compose the form
        string form64 = '';

        form64 += HttpFormBuilder.WriteBoundary();
        form64 += HttpFormBuilder.WriteBodyParameter('labelId', EncodingUtil.urlEncode(labelId, 'UTF-8'));
        form64 += HttpFormBuilder.WriteBoundary();
        form64 += HttpFormBuilder.WriteBodyParameter('name', EncodingUtil.urlEncode(filename, 'UTF-8'));
        form64 += HttpFormBuilder.WriteBoundary();
        form64 += HttpFormBuilder.WriteBodyParameter('data', EncodingUtil.base64Encode(exampleImageData));
        form64 += HttpFormBuilder.WriteBoundary(HttpFormBuilder.EndingType.CrLf);
        
        blob formBlob = EncodingUtil.base64Decode(form64);
        string contentLength = string.valueOf(formBlob.size());
        //  Compose the http request
        HttpRequest httpRequest = new HttpRequest();

        httpRequest.setBodyAsBlob(formBlob);
        httpRequest.setHeader('Connection', 'keep-alive');
        httpRequest.setHeader('Content-Length', contentLength);
        httpRequest.setHeader('Content-Type', contentType);
        httpRequest.setMethod('POST');
        httpRequest.setTimeout(120000);
        httpRequest.setHeader('Authorization','Bearer ' + access_token);
        httpRequest.setEndpoint(postUrl);

        //...
    }

Where did I go wrong? Well, unlike a prediction there isn't a variant of this call API that accepts the Base64 encoded image. Using HttpFormBuilder.WriteBodyParameter with the Base64 encoded blob to write the data wasn't going to work. MetaMind want the actual bytes and associated Content-Type header. Here is how the working request appears in Postman:

POST /v1/vision/datasets/1001419/examples HTTP/1.1
Host: api.metamind.io
Authorization: Bearer 12346539689dae622cbd91d9d4880b3314bfb747
Cache-Control: no-cache

----WebKitFormBoundaryE19zNvXGzXaLvS5C
Content-Disposition: form-data; name="data"; filename="ItsAFrog.png"
Content-Type: image/png

<bytes of file go here>
----WebKitFormBoundaryE19zNvXGzXaLvS5C
Content-Disposition: form-data; name="labelId"

8588
----WebKitFormBoundaryE19zNvXGzXaLvS5C
Content-Disposition: form-data; name="name"

itafrog.png
----WebKitFormBoundaryE19zNvXGzXaLvS5C

An alternative method was needed to HttpFormBuilder.WriteBodyParameter. One that would set the correct Content-Disposition and Content-Type headers for the binary data and then also correctly append the bytes from the image. That last point is really important. Correcting the headers wasn't very difficult, but with the binary file data I found the last few bytes were getting corrupted on the file.

Thankfully Enrico Murru had already covered a lot of the details around how to get proper multipart form POSTs working in Apex, so full credit to him for the blog post POST Mutipart/form-data with HttpRequest. This was then later refined by Grant Wickman in an answer to Post multipart without Base64 Encoding the body. The more I researched why HttpFormBuilder wasn't working the more I found MetaMind had based HttpFormBuilder partially on Enrico and Grant's work but hadn't taking it through to completion. For instance, the EndingType enum has the following comment indicating they were clearly thinking about it, just never applied it.

Helper enum indicating how a file's base64 padding was replaced.

To get it working I needed to finish applying Enrico and Grant's solutions.

It's all about the Base(64 encoding)

The smallest unit of storage on a file is a (octet) byte, which is composed of 8 bits. In contrast, Base64 encoding represents 6-bits with each character. So 4 Base64 characters can represent 3 bytes of input data. That ratio of 4:3 is very important with Base64 encoding. If the number of input bytes to be encoded is divisible by 3 then everything is fine as all 4 Base64 characters will represent meaningful input.

The problem occurs if the input byte length isn't divisible by 3. In that case the Base64 encoding process would normally append a padding symbol (=) or two to the end of the encoding to indicate which were the valid bytes in the last 3 and which were padding to get to the correct grouping of 3 bytes to 4 characters.

The following table shows how the padding needs to be applied if there are only two or one bytes to be encoded. You can see how some of the bits in the 6-Bit groupings no longer represent actual data when encoding only one or two input bytes.

Data Bytes In Binary Form 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1
Data Rearranged Into 6-Bit Groups 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1
6-Bit Groups in Decimal Form     5 3           2         3 1         5 5    
Groups Converted to ASCII Characters       1           C           f           3    
Data Bytes In Binary Form 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1                
Data Rearranged Into 6-Bit Groups 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0            
6-Bit Groups in Decimal Form     5 3           2         2 8                
Groups Converted to ASCII Characters       1           C           c           =    
Data Bytes In Binary Form 1 1 0 1 0 1 0 0                                
Data Rearranged Into 6-Bit Groups 1 1 0 1 0 1 0 0 0 0 0 0                        
6-Bit Groups in Decimal Form     5 3           0                            
Groups Converted to ASCII Characters       1           A           =           =    

Enrico's method to build up the full multipart form submission from Apex is to first build each part individually using Base64 encoding and then concatenate those parts together. That final Base64 string is then converted back to a Blob to become the body of the HTTP POST request.

The challenge with this method is you can't have intermediate padding characters. The Apex Base64 decoding process is going to ignore those intermediate padding characters. This in turn causes an incorrect mapping between the 4 Base64 characters and the 3 bytes they are supposed to represent. Here is an example using some anonymous Apex. Note how the decoding the concatenated values gives the same result with or without the internal padding:

String encodedA = EncodingUtil.base64Encode(Blob.valueOf('A'));
String encodedB = EncodingUtil.base64Encode(Blob.valueOf('BC'));
System.debug(encodedA);
System.debug(encodedB);
System.debug(EncodingUtil.base64Decode(encodedA).toString());
System.debug(EncodingUtil.base64Decode(encodedB).toString());
System.debug(EncodingUtil.base64Decode(encodedA + encodedB).toString());
System.debug(EncodingUtil.base64Decode('QQ' + 'QkM=').toString());

Output

20:17:50.65 (69059707)|USER_DEBUG|[3]|DEBUG|QQ==
20:17:50.65 (69127131)|USER_DEBUG|[4]|DEBUG|QkM=
20:17:50.65 (69373165)|USER_DEBUG|[5]|DEBUG|A
20:17:50.65 (69500491)|USER_DEBUG|[6]|DEBUG|BC
20:17:50.65 (69636544)|USER_DEBUG|[7]|DEBUG|A$
20:17:50.65 (69759126)|USER_DEBUG|[8]|DEBUG|A$

For encoding text this problem is bypassed by appending additional whitespace characters to the base text until the Base64 encoded representation no longer requires padding. Without the padding it is then possible to concatenate the two base 64 encoded values and then later decode the complete string back to the Blob sans additional bytes. This however doesn't work with a binary file like an image. You can't just go on appending additional bytes on the end without corrupting the file.

Instead the solution presented by Grant Wickman is to borrow one or both of the CR (\r) or LF (\n) characters that separate the end of the file bytes from the multipart boundary.

By shuffling these legitimate characters onto the end of the files bytes the need for padding can be removed. The padding on the boundary can then be adjusted as required using the white-space technique where it won't affect the integrity of the data.

There is a certain elegance to it as a solution. Note how the first 4 bits of the CR align with the last 6 bits of the second Base64 character so they don't need to be changed. And then again, if only switching out one padding = for a LF the first two bits again align.

I find that table very satisfying. I think I might print it out and keep it at my desk like one of those inspirational posters.

Example of additional HttpFormBuilder method

What did we learn?

  1. That rogue = padding characters in the middle of a Base64 string will corrupt the data.
  2. If the Base64 encoding for the file ends with "==", crop those characters off and replace with 0K to represent a CRLF. Then don't prepend the following boundary footer with those characters.
  3. If the Base64 encoding for the file ends with "=", crop that character off and replace with N to represent a CR. Then only prepend the following boundary footer a LF.
  4. That @muenzpraeger has a more refined Apex project for working with the Einstein Predictive Vision Service. I've submitted my changes there.
  5. The O'Reilly Book covers featuring pictures of animals are done in a woodcut/hedcut style.
  6. Base64 encoding examples can be inspirational and look great at your desk.