Rather than scheduling a one-time job, schedule a recurring job.
Schedule the job to run on an hourly interval (every hour). As part of the finishing phase of your job, cancel this hourly schedule and replace it with another similar hourly schedule where the first execution is set to be a short period (let's say 5 minutes) from the finish of the job.
This works in a very similar way to using a "one off" schedule (as per your existing implementation) - in both of these implementations the job is rescheduled in the finish phase, but by using a recurring schedule you have the added benefit that if for any reason the job does not execute, the platform will attempt to run it again an hour later, and every hour until it succeeds.
Note that we don't know why the job may fail to execute - but we're assuming that it relates to platform maintenance. Chaining one-off scheduled jobs together relies on the successful start and completion of each job for the integrity of the chain, whereas using a recurring scheduled job provides "auto-resume" behaviour regardless of the successful start / completion of an individual job.
Example process flow:
(1) at 12:00 we schedule a job to run every every hour, at 5 minutes
past the hour: 12:05,13:05,14:05...etc...
(2) at 12:05 the batch manager job is started according to the hourly
schedule, and this checks your custom batch job object records to see
if there is any work currently running or waiting.
It finds that there are no jobs running but there is a job waiting:
"Foo". The batch manager therefore starts the batch process for Foo.
(3) at 13:05 the batch manager job is started according to the hourly
schedule.
On this occasion it finds that job Foo is in progress and so quits
taking no action.
(4) at 13:35 job Foo finishes.
In the finish phase, the existing hourly scheduled job is cancelled,
and another new hourly job is scheduled, this time to run at 40
minutes past the hour: 13:40, 14:40, 15:40...etc…
(5) at 13:40 the batch manager job is due to start according to the
hourly schedule, but this fails (we assume because of platform
maintenance)
(6) at 14:40 the batch manager job is started according to the hourly
schedule.
It finds that there are no jobs running but there is a job waiting: "Bar". The batch manager therefore starts the batch process for Bar.
etc.
I think using Batch Apex chaining is closer to what your aiming for here, though not as bullet proof as a scheduled job reviewing records that have yet to have a Geocode calculated. As while this is not as immediate, it does, as you've said give more predictable processing (may need to still consider schedule overlaps though) and has some built in error handling and retry semantics to it.
That said some notes on the chaining approach...
- Ensuring there can be only one! You could query the AsyncApexJob by class Id to determine if a job is already running before starting another. Though there is still a small concurrency issue, as you cannot guarantee at that precise moment another parallel trigger invocation does not make the same query and arrive at the same answer (since there is no lock on the AsyncApexJob records). This could be quite a high likelihood if your hammering in a lot of Accounts. What you could do if you are concerned about this is use a custom object as a semaphore, this object has a unique constraint field on it, preventing multiple inserts if duplicates records are found. If you fail to insert into it, prior to attempting to star the job, you can take this as a sign the job is still running. When your job is completely done (see below) this record is removed.
- Chainging the jobs. As you mentioned in the finish method you can start a new batch job if you determine new unprocessed Accounts have been inserted in the meantime. Be sure not to delete the semaphore record though, until you know you've not got any more work to do. As other trigger invocations maybe attempt to start a new job inbetween the transition between the current job ending and the new one starting. There is one other consideration here as well, only a maximum of 5 batch jobs can be queued/running in the org, so the chain could get broken. One solution to this is to query as per this answer and schedule a job in the future (using the new batch schedule feature) to try again, see this answer for more detail Cascading batch jobs.
Risk Management. As you can see there is some risk in the above, its a mater of judging the likelihood vs the overhead of the users in resolving the effects of the processing not occurring. For example you could have a button on the Account page as fall back to calculate as and when needed. Or have a scheduled job act as a secondary sweeper kicking off your job every day to check for records that have not been processed (again using the semaphore). Thus giving the users a near realtime update, with the added security in knowing the sweeper will pick up instances where the chaining fails due to fringe cases such as batch apex job governor and/or remaining concurrency issues in the solution.
When all is said and done... At the end of the day, triggers and batch jobs (or @future for that matter) don't mix that well. The advice in the Apex docs to think carefully about using them is a little to tame when you really start to think about all the considerations of starting jobs from a trigger context. That said, if you known and accept the potential points of failure and have some plan to observe and address them if they occur it can be acceptable.
Use extreme care if you are planning to invoke a batch job from a trigger. You must be able to guarantee that the trigger will not add more batch jobs than the five that are allowed. In particular, consider API bulk updates, import wizards, mass record changes through the user interface, and all cases where more than one record can be updated at a time.
Hope this helps!
Best Answer
Basically its an async job .Asynchronous apex executes when resources get available in the salesforce servers so its normal that sometimes it takes time .
There can be one more reason is your code .If its not optimised it may take time .
Also if it was only for simply deleting using bulk API or data loader enabled with bulk API would have processed your job quicker.