After finally speaking to someone in Salesforce support who seemed to know what my point was, it was explained to me:
'Actual execution might be delayed based on service availability' means that the batch is enqueued for the interval initially set from the scheduleBatch method. 'Service availability' is not a reference to the concurrent batch count freeing up. This feature was not designed to manage the limit in question.
The rep also went on to explain that System.scheduleBatch
was provided only to remove the necessity to have an intermediate class that implements the Schedulable
interface and calls Database.executeBatch()
.
So, on that basis the documentation might better state:
• When you call Database.scheduleBatch
, Salesforce schedules the job
for execution at the specified time. Actual execution might be delayed
based on service availability, and if at that point of execution, there are no concurrent batch slots available, the job will not run at all (ever), nor will you be notified that it hasn't run.
This is not acceptable for my usage and I will not be able to make use of this function when a batch job absolutely must run, for this I will continue to use my own tried and tested pattern :)
Rather than scheduling a one-time job, schedule a recurring job.
Schedule the job to run on an hourly interval (every hour). As part of the finishing phase of your job, cancel this hourly schedule and replace it with another similar hourly schedule where the first execution is set to be a short period (let's say 5 minutes) from the finish of the job.
This works in a very similar way to using a "one off" schedule (as per your existing implementation) - in both of these implementations the job is rescheduled in the finish phase, but by using a recurring schedule you have the added benefit that if for any reason the job does not execute, the platform will attempt to run it again an hour later, and every hour until it succeeds.
Note that we don't know why the job may fail to execute - but we're assuming that it relates to platform maintenance. Chaining one-off scheduled jobs together relies on the successful start and completion of each job for the integrity of the chain, whereas using a recurring scheduled job provides "auto-resume" behaviour regardless of the successful start / completion of an individual job.
Example process flow:
(1) at 12:00 we schedule a job to run every every hour, at 5 minutes
past the hour: 12:05,13:05,14:05...etc...
(2) at 12:05 the batch manager job is started according to the hourly
schedule, and this checks your custom batch job object records to see
if there is any work currently running or waiting.
It finds that there are no jobs running but there is a job waiting:
"Foo". The batch manager therefore starts the batch process for Foo.
(3) at 13:05 the batch manager job is started according to the hourly
schedule.
On this occasion it finds that job Foo is in progress and so quits
taking no action.
(4) at 13:35 job Foo finishes.
In the finish phase, the existing hourly scheduled job is cancelled,
and another new hourly job is scheduled, this time to run at 40
minutes past the hour: 13:40, 14:40, 15:40...etc…
(5) at 13:40 the batch manager job is due to start according to the
hourly schedule, but this fails (we assume because of platform
maintenance)
(6) at 14:40 the batch manager job is started according to the hourly
schedule.
It finds that there are no jobs running but there is a job waiting: "Bar". The batch manager therefore starts the batch process for Bar.
etc.
Best Answer
To clean up the list, you can purge the old jobs:
You should only be calling System.abortJob to cancel jobs that have a future.
In the future, if you do need to abort a CronTrigger, use the Id value, not the CronJobDetailId: