[SalesForce] “SerialBatchApexRangeChunkHandler Internal Salesforce.com Error” from a batch execution

We are an ISV and we have a managed package. In the managed package we have an implementation of Database.Batchable that is also implementing Database.Stateful. It is global, so is part of our exposed API for use outside the package. The implementations of the start, execute and finish methods are only public and provide the basic implementation for the batch itself.

The class exposes some similar global abstract methods, onStart, onExecute and onFinish, to allow our class to be extended and processing implemented against these steps in the batch process flow. (There are good reasons why we have done this that I'm not covering in this explanation.)

Note that we use an abstract base class to provide our "secret sauce" behaviour rather than using an interface and some form of delegation arrangement because we want the implementation class name to appear in the Async Apex Jobs table, thereby allowing admins to know which batch is which.

All works fine for us when we have an extension of this class within our managed package; we can schedule the batch and have it execute fine.

To give an idea, part of the API looks like:

global abstract without sharing class AdaptiveBatch implements Database.Batchable<SObject>,
        Database.Stateful {
    public System.Iterable<SObject> start(Database.BatchableContext context) {
        ...
    }
    ...

    global abstract void onStart(Database.BatchableContext context);
}

The (customer-specific) implementation then might look like the following (adjusted with namespaces appropriately, and made global, when the implementation is outside the package):

public without sharing class TestAdaptiveBatch extends AdaptiveBatch {
    public override void onStart(Database.BatchableContext context) {
        ...
    }
    ...
}

However, when we have a customer-specific implementation extending this API outside the package we get the above error whenever the batch is executed.

We know that it isn't the typical problem of unserializable state in the batch – we have one SObject, a Date and a List<Set<Id>> in the customer-specific case and we know we can successfully JSON serialize and deserialize an instance of this batch implementation. We also know we can successfully queue the batch for execution, with this state defined, using Database.executeBatch.

When Salesforce tries to execute the batch, we get the error cited above. There is nothing useful in the debug logs, even when collected via subscriber login; just entry to the custom class, some heap allocation statements then the error itself before any methods in the custom code get invoked.

I have a suspicion that this may relate to cross-namespace issues; we have a custom class on the subscriber org (and in the org's "no namespace" namespace) that is a Database.Batchable, because it inherits that relationship from our API, but where the methods for the batchable are actually in our package namespace. This is the only conceptual difference between our in-package extension of our API and the customer-specific extension of our API.

Has anyone come across this issue and, if so, do you have a solution? Do we need to make our implementations of the start/execute/finish methods global to sort this out? Any other suggestions?

I am asking first because we will have to create a new (beta, with luck) release of our package just to test these possible changes, installing the updated package and testing the changes against a custom class on a temporary org. It would be nice to avoid a lot of iteration on this!

Thanks in advance

Best Answer

In order to investigate this further I created a new dev org on which to play, installed my package, created a specialist implementation of my AdaptiveBatch API and tried out various types of processing.

I stumbled upon the answer to the issue because I started receiving Developer Script Exception emails - I was not the recipient of these on the sandbox where we first saw this problem.

The logs confirm that the start and finish method async executions for the batch were being triggered and processed fine. The processing was failing with the strange internal error only during the initialization of the execute method async execution.

The start method returns a query locator (SOQL). Where this references one of our package's custom fields the namespace prefix was not specified. The processing seems to handle this fine, as it always has historically across our code base, with log entries like:

08:17:01.0 (183716704)|SOQL_EXECUTE_BEGIN|[188]|Aggregations:0|SELECT firstname, lastname, payment_account__c, id FROM Contact 
08:17:01.0 (219117528)|SOQL_EXECUTE_END|[188]|Rows:2
08:17:01.0 (219380886)|METHOD_EXIT|[20]|01p4J000003gjQd|sirenum.AdaptiveBatch.start(Database.BatchableContext)

You can see from this that (in my test org) there are two rows selected, and these include the "payment_account__c" field being queried. That field is actually from our package, but is named without the prefix.

However, when execute is to be invoked the log simply contains something like:

08:17:01.0 (378428)|CODE_UNIT_STARTED|[EXTERNAL]|01p4J000003gkWc|MyAdaptiveBatch
08:17:01.0 (3495717)|HEAP_ALLOCATE|[72]|Bytes:3
08:17:01.0 (3577878)|HEAP_ALLOCATE|[77]|Bytes:152
08:17:01.0 (3594977)|HEAP_ALLOCATE|[342]|Bytes:408
08:17:01.0 (3608691)|HEAP_ALLOCATE|[355]|Bytes:408
08:17:01.0 (3622224)|HEAP_ALLOCATE|[467]|Bytes:48
08:17:01.0 (3650375)|HEAP_ALLOCATE|[139]|Bytes:6
08:17:01.0 (4766801)|HEAP_ALLOCATE|[EXTERNAL]|Bytes:578
08:17:01.0 (36070965)|FATAL_ERROR|Internal Salesforce.com Error
08:17:01.36 (36112562)|CUMULATIVE_LIMIT_USAGE

This then appears to relate to the developer script exception email:

Developer script exception from XXX: 'MyAdaptiveBatch': SELECT firstname, lastname, payment_account__c, id FROM Contact
^ ERROR at Row:1:Column:29 No such column 'payment_account__c' on entity 'Contact'. If you are attempting to use a custom field, be sure to append the '__c' after the custom field name. Please reference your WSDL or the describe call for the appropriate names

It seems that this is not, therefore, due to the Batchable methods being public (rather than global), which is a really good thing. At least I can address this behind the API by making sure namespaces are applied explicitly in the SOQL query fields.

Strange that it trips up in this specific context and not otherwise. I mean - our other batches fully implemented in the package don't require the namespace prefix, but this batch who's execute method is fully implemented in the package but where the actual queued class is outside the package fails, yet only in execute and not in start. This inconsistent start/execute behaviour is very strange and the fact it is reported as an internal error with no useful log detail doesn't help.

Related Topic