Rather than scheduling a one-time job, schedule a recurring job.
Schedule the job to run on an hourly interval (every hour). As part of the finishing phase of your job, cancel this hourly schedule and replace it with another similar hourly schedule where the first execution is set to be a short period (let's say 5 minutes) from the finish of the job.
This works in a very similar way to using a "one off" schedule (as per your existing implementation) - in both of these implementations the job is rescheduled in the finish phase, but by using a recurring schedule you have the added benefit that if for any reason the job does not execute, the platform will attempt to run it again an hour later, and every hour until it succeeds.
Note that we don't know why the job may fail to execute - but we're assuming that it relates to platform maintenance. Chaining one-off scheduled jobs together relies on the successful start and completion of each job for the integrity of the chain, whereas using a recurring scheduled job provides "auto-resume" behaviour regardless of the successful start / completion of an individual job.
Example process flow:
(1) at 12:00 we schedule a job to run every every hour, at 5 minutes
past the hour: 12:05,13:05,14:05...etc...
(2) at 12:05 the batch manager job is started according to the hourly
schedule, and this checks your custom batch job object records to see
if there is any work currently running or waiting.
It finds that there are no jobs running but there is a job waiting:
"Foo". The batch manager therefore starts the batch process for Foo.
(3) at 13:05 the batch manager job is started according to the hourly
schedule.
On this occasion it finds that job Foo is in progress and so quits
taking no action.
(4) at 13:35 job Foo finishes.
In the finish phase, the existing hourly scheduled job is cancelled,
and another new hourly job is scheduled, this time to run at 40
minutes past the hour: 13:40, 14:40, 15:40...etc…
(5) at 13:40 the batch manager job is due to start according to the
hourly schedule, but this fails (we assume because of platform
maintenance)
(6) at 14:40 the batch manager job is started according to the hourly
schedule.
It finds that there are no jobs running but there is a job waiting: "Bar". The batch manager therefore starts the batch process for Bar.
etc.
I haven't seen that but when I am debugging scheduled batches, I typically schedule it via the dev console to run in 2 minutes time, and then watch the logs - even if if something isn't written to the jobs queue, you should see it in the logs for the running user (which would be you if created via Dev Console)
Best Answer
Impact of scheduled jobs due to fall of Day light savings
Knowledge Article Number: 000170765
Description On Nov 4th, we have fall of day light savings. could you please let us know how is this handled in SFDC cloud. In general, clocks will be set back from 1:59AM to 1AM. In this case the jobs which are scheduled at 1AM will run twice on this day. Do we have the same impact in SFDC as well.
Resolution The scheduled date time is in GMT, so the job is executed once.
Say the user is in GMT+5 (considering dailight savings), 1 am GMT+5 is 8pm GMT (previous day), so at 8 pm GMT the job is executed. When the day light savings stop applying at 2 am GMT+5 (9 pm GMT) the time for the user is displayed as 1 am GMT+4 (still 9 pm GMT). We don't go back in time, we just change the offset with GMT.
The same goes for the off set of the daylight saving Just apply the corresponding time offset in the time displayed to the user.
Here is link to Salesforce Knowledge Article https://help.salesforce.com/apex/HTViewSolution?id=000170765&language=en_US