I think using Batch Apex chaining is closer to what your aiming for here, though not as bullet proof as a scheduled job reviewing records that have yet to have a Geocode calculated. As while this is not as immediate, it does, as you've said give more predictable processing (may need to still consider schedule overlaps though) and has some built in error handling and retry semantics to it.
That said some notes on the chaining approach...
- Ensuring there can be only one! You could query the AsyncApexJob by class Id to determine if a job is already running before starting another. Though there is still a small concurrency issue, as you cannot guarantee at that precise moment another parallel trigger invocation does not make the same query and arrive at the same answer (since there is no lock on the AsyncApexJob records). This could be quite a high likelihood if your hammering in a lot of Accounts. What you could do if you are concerned about this is use a custom object as a semaphore, this object has a unique constraint field on it, preventing multiple inserts if duplicates records are found. If you fail to insert into it, prior to attempting to star the job, you can take this as a sign the job is still running. When your job is completely done (see below) this record is removed.
- Chainging the jobs. As you mentioned in the finish method you can start a new batch job if you determine new unprocessed Accounts have been inserted in the meantime. Be sure not to delete the semaphore record though, until you know you've not got any more work to do. As other trigger invocations maybe attempt to start a new job inbetween the transition between the current job ending and the new one starting. There is one other consideration here as well, only a maximum of 5 batch jobs can be queued/running in the org, so the chain could get broken. One solution to this is to query as per this answer and schedule a job in the future (using the new batch schedule feature) to try again, see this answer for more detail Cascading batch jobs.
Risk Management. As you can see there is some risk in the above, its a mater of judging the likelihood vs the overhead of the users in resolving the effects of the processing not occurring. For example you could have a button on the Account page as fall back to calculate as and when needed. Or have a scheduled job act as a secondary sweeper kicking off your job every day to check for records that have not been processed (again using the semaphore). Thus giving the users a near realtime update, with the added security in knowing the sweeper will pick up instances where the chaining fails due to fringe cases such as batch apex job governor and/or remaining concurrency issues in the solution.
When all is said and done... At the end of the day, triggers and batch jobs (or @future for that matter) don't mix that well. The advice in the Apex docs to think carefully about using them is a little to tame when you really start to think about all the considerations of starting jobs from a trigger context. That said, if you known and accept the potential points of failure and have some plan to observe and address them if they occur it can be acceptable.
Use extreme care if you are planning to invoke a batch job from a trigger. You must be able to guarantee that the trigger will not add more batch jobs than the five that are allowed. In particular, consider API bulk updates, import wizards, mass record changes through the user interface, and all cases where more than one record can be updated at a time.
Hope this helps!
In order to investigate this further I created a new dev org on which to play, installed my package, created a specialist implementation of my AdaptiveBatch API and tried out various types of processing.
I stumbled upon the answer to the issue because I started receiving Developer Script Exception emails - I was not the recipient of these on the sandbox where we first saw this problem.
The logs confirm that the start and finish method async executions for the batch were being triggered and processed fine. The processing was failing with the strange internal error only during the initialization of the execute method async execution.
The start method returns a query locator (SOQL). Where this references one of our package's custom fields the namespace prefix was not specified. The processing seems to handle this fine, as it always has historically across our code base, with log entries like:
08:17:01.0 (183716704)|SOQL_EXECUTE_BEGIN|[188]|Aggregations:0|SELECT firstname, lastname, payment_account__c, id FROM Contact
08:17:01.0 (219117528)|SOQL_EXECUTE_END|[188]|Rows:2
08:17:01.0 (219380886)|METHOD_EXIT|[20]|01p4J000003gjQd|sirenum.AdaptiveBatch.start(Database.BatchableContext)
You can see from this that (in my test org) there are two rows selected, and these include the "payment_account__c" field being queried. That field is actually from our package, but is named without the prefix.
However, when execute is to be invoked the log simply contains something like:
08:17:01.0 (378428)|CODE_UNIT_STARTED|[EXTERNAL]|01p4J000003gkWc|MyAdaptiveBatch
08:17:01.0 (3495717)|HEAP_ALLOCATE|[72]|Bytes:3
08:17:01.0 (3577878)|HEAP_ALLOCATE|[77]|Bytes:152
08:17:01.0 (3594977)|HEAP_ALLOCATE|[342]|Bytes:408
08:17:01.0 (3608691)|HEAP_ALLOCATE|[355]|Bytes:408
08:17:01.0 (3622224)|HEAP_ALLOCATE|[467]|Bytes:48
08:17:01.0 (3650375)|HEAP_ALLOCATE|[139]|Bytes:6
08:17:01.0 (4766801)|HEAP_ALLOCATE|[EXTERNAL]|Bytes:578
08:17:01.0 (36070965)|FATAL_ERROR|Internal Salesforce.com Error
08:17:01.36 (36112562)|CUMULATIVE_LIMIT_USAGE
This then appears to relate to the developer script exception email:
Developer script exception from XXX:
'MyAdaptiveBatch':
SELECT firstname, lastname, payment_account__c, id FROM Contact
^ ERROR at Row:1:Column:29 No such column 'payment_account__c' on entity 'Contact'.
If you are attempting to use a custom field, be sure to append the '__c' after
the custom field name. Please reference your WSDL or the describe call for the
appropriate names
It seems that this is not, therefore, due to the Batchable methods being public (rather than global), which is a really good thing. At least I can address this behind the API by making sure namespaces are applied explicitly in the SOQL query fields.
Strange that it trips up in this specific context and not otherwise. I mean - our other batches fully implemented in the package don't require the namespace prefix, but this batch who's execute method is fully implemented in the package but where the actual queued class is outside the package fails, yet only in execute and not in start. This inconsistent start/execute behaviour is very strange and the fact it is reported as an internal error with no useful log detail doesn't help.
Best Answer
Assuming you don't have too many records, that is, within the SOQL records limit. The limit is 50,000 rows I think. If you have more than that, you will probably need batch to handle this scenario.
Apex code: