I need to Geocode addresses on Account using an HTTP callout; the webservice will only service one address per callout. For this use case it is preferable to geocode the accounts immediately, rather than periodically via a scheduled batch. To do so, I have a class that implements Database.Batchable
and Database.AllowsCallouts
, which takes a list of account ids in the constructor. start()
returns a QueryLocator
for the needed account fields, and execute()
makes the callouts. My trigger (on Account after insert, after update) calls Database.executeBatch()
with a batch size of 10, to stay under the callout limit.
This works perfectly for data loads up to 200 records. Once I get over that size of course, the system splits the records into multiple chunks and invokes the trigger for each chunk (of up to 200 records), so I end up with concurrent executions of my Batchable
class (as each trigger invocation is calling Database.executeBatch()
), and then I get Rate Limit Exceeded errors from my geocoding service provider.
I'm considering changing my trigger logic a bit to query for running/queued batches; if any are found, to use system.scheduleBatch()
instead of Database.executeBatch()
, scaling the minutesFromNow
param based on the number of batches found. Given that scheduled batch delays are guidelines only, this isn't a guarantee that two batches won't run concurrently, but it's possible this could be a 'good enough' solution for my particular case with the right delay param. If not, from there my next option is a separate SObject to track to-be-run batches, and logic in finish()
to run the next batch (perhaps similar to this question, though perhaps without the controlling job).
Before I start complicating things, is there a simple way to prevent multiple instances of a given Batchable class from running concurrently? Or a simpler way of serializing the batches than writing my own scheduler?
Best Answer
I think using Batch Apex chaining is closer to what your aiming for here, though not as bullet proof as a scheduled job reviewing records that have yet to have a Geocode calculated. As while this is not as immediate, it does, as you've said give more predictable processing (may need to still consider schedule overlaps though) and has some built in error handling and retry semantics to it.
That said some notes on the chaining approach...
Risk Management. As you can see there is some risk in the above, its a mater of judging the likelihood vs the overhead of the users in resolving the effects of the processing not occurring. For example you could have a button on the Account page as fall back to calculate as and when needed. Or have a scheduled job act as a secondary sweeper kicking off your job every day to check for records that have not been processed (again using the semaphore). Thus giving the users a near realtime update, with the added security in knowing the sweeper will pick up instances where the chaining fails due to fringe cases such as batch apex job governor and/or remaining concurrency issues in the solution.
When all is said and done... At the end of the day, triggers and batch jobs (or @future for that matter) don't mix that well. The advice in the Apex docs to think carefully about using them is a little to tame when you really start to think about all the considerations of starting jobs from a trigger context. That said, if you known and accept the potential points of failure and have some plan to observe and address them if they occur it can be acceptable.
Hope this helps!