Friday, October 23, 2009

SOQL Pagination for Salesforce API queries with the S4S Data Connector

At the start of the month I asked a question of Stack Overflow about how to efficiently page through SOQL results. See Stack Overflow : SOQL Pagination for Salesforce API queries. As at today I haven't received any answers. There is Salesforce Ideas : SOQL Request: allow greater-than, less-than WHERE conditions for Id fields, but it requires starting a new query for each page.

I've taken a different approach with the Sitecore for Salesforce Data Connector (DC). Using the QueryResultPager I can jump to any page in the query results with minimal overhead.

The following is an example of paging over 10,000 Salesforce Leads from C#. In real world ASP.NET examples the QueryResultPager would be persisted between requests. The performance is a respectable 10 seconds in the debug build, which I think is pretty good considering I'm jumping all over the place with the page index and page size. More linear paging performs better.

using Microsoft.VisualStudio.TestTools.UnitTesting; //Only used for the Assert in the example
using FuseIT.Sitecore.SalesforceConnector.Entities;
using FuseIT.Sitecore.SalesforceConnector.Services;

//...

[TestMethod]
public void LeadsPagination()
{
    SalesforceSession salesforceSession = new SalesforceSession(
        new LoginDetails("username@example.com", "salesforcePassword"));
    LeadDataSource leadDataSource = new LeadDataSource(salesforceSession);
    string[] requiredLeadFields = new string[] { "Id", "Name", "Email", "Description" };
    QueryResultPager pager = leadDataSource.GetPager(requiredLeadFields);

    Assert.IsTrue(pager.TotalRecordCount > 10000, "The total number of records");

    var firstPage = pager.GetPage(salesforceSession, 0, 10); //Records 0 to 9
    List leads = leadDataSource.EntitysFromQueryResult(firstPage);

    var randomPage = pager.GetPage(salesforceSession, 30, 100); //Records 3000 to 3099
    leads = leadDataSource.EntitysFromQueryResult(randomPage);

    randomPage = pager.GetPage(salesforceSession, 1, 150); //Records 150 to 299
    leads = leadDataSource.EntitysFromQueryResult(randomPage);

    randomPage = pager.GetPage(salesforceSession, 17, 15); //Records 255 to 269
    leads = leadDataSource.EntitysFromQueryResult(randomPage);

    randomPage = pager.GetRecords(salesforceSession, pager.TotalRecordCount - 101, pager.TotalRecordCount - 1); //The last 100 leads
    leads = leadDataSource.EntitysFromQueryResult(randomPage);
}

Behind the scenes this is sending the SOQL query select Id, Name, Email, Description from Lead and then utilizing the QueryLocator.

For Salesforce Organizations with customizations, like fields, you would use the T4 generated classes that match your org rather than those in FuseIT.Sitecore.SalesforceConnector.Entities.
NB: These code snippets are applicable to the current 1.0.5 release.