Rather than scheduling a one-time job, schedule a recurring job.
Schedule the job to run on an hourly interval (every hour). As part of the finishing phase of your job, cancel this hourly schedule and replace it with another similar hourly schedule where the first execution is set to be a short period (let's say 5 minutes) from the finish of the job.
This works in a very similar way to using a "one off" schedule (as per your existing implementation) - in both of these implementations the job is rescheduled in the finish phase, but by using a recurring schedule you have the added benefit that if for any reason the job does not execute, the platform will attempt to run it again an hour later, and every hour until it succeeds.
Note that we don't know why the job may fail to execute - but we're assuming that it relates to platform maintenance. Chaining one-off scheduled jobs together relies on the successful start and completion of each job for the integrity of the chain, whereas using a recurring scheduled job provides "auto-resume" behaviour regardless of the successful start / completion of an individual job.
Example process flow:
(1) at 12:00 we schedule a job to run every every hour, at 5 minutes
past the hour: 12:05,13:05,14:05...etc...
(2) at 12:05 the batch manager job is started according to the hourly
schedule, and this checks your custom batch job object records to see
if there is any work currently running or waiting.
It finds that there are no jobs running but there is a job waiting:
"Foo". The batch manager therefore starts the batch process for Foo.
(3) at 13:05 the batch manager job is started according to the hourly
schedule.
On this occasion it finds that job Foo is in progress and so quits
taking no action.
(4) at 13:35 job Foo finishes.
In the finish phase, the existing hourly scheduled job is cancelled,
and another new hourly job is scheduled, this time to run at 40
minutes past the hour: 13:40, 14:40, 15:40...etc…
(5) at 13:40 the batch manager job is due to start according to the
hourly schedule, but this fails (we assume because of platform
maintenance)
(6) at 14:40 the batch manager job is started according to the hourly
schedule.
It finds that there are no jobs running but there is a job waiting: "Bar". The batch manager therefore starts the batch process for Bar.
etc.
EXCEPTION: System.StringException: You can't abort scheduled apex jobs by calling system.abortjob with an AsyncApexJob ID.
You must call system.abortjob with the parent CronTrigger ID.
I tried same in workbench with lowed api version then it worked. I was able to delete 2 jobs in API version 32. Follow these steps to run this statement:
- login to "https://workbench.developerforce.com/login.php"
- On right corner, it will be showing your name and API version. Click on that link
- There you will find change API version, change it to 32
- Go to Utilities >> Apex Execute
There run this command with job Id
I believe this would help you. Please let me know if you have any other questions, I would love to help you in that as well.
Best Answer
You are right this is related to a known issue, but this has been adressed on SFSE before. See this question
Ghost Schedulable Classes Blocking Deployment
According to @Ralph, a SFSE user with high reputation, he has first hand experience with this and SF support was actually able to fix for him.