[SalesForce] Scheduled Apex not consistantly executing, sending emails

I'm having a problem similar to this user's question from about a year ago, although it's not clear to me whether the solution provided there actually resolved the issue (or if it just made the Salesforce UX better-represent the state of things on the Apex Jobs page).

I have a scheduled class that is intended run nightly. It will search through an object to find records where a custom checkbox = True, and then send an email with details about those records to an inbox for archiving. The schedulable class is as follows:

global class ScheduledArchiveProcess implements Schedulable {

    // Runs job every day at 11PM
    private static final String CRON_EXP = '0 0 23 * * ? *';

    global void execute(SchedulableContext sc) {
        new CaseArchiveService().doArchive();
    }

    public static void setSchedule(String name) {
        System.schedule(name, CRON_EXP, new ScheduledArchiveProcess());
    }

    public static void setSchedule(String name, String cronExp) {
        System.schedule(name, cronExp, new ScheduledArchiveProcess());
    }
}

When I schedule the job (either via the UX or via the setSchedule methods above), it will not consistently send emails. I will setup test data and schedule the class, then nothing will happen for days. For whatever reason, several days into the schedule it will then choose to fire off the expected emails, but then will return to not consistently running. The Scheduled Jobs page implies it is running, but I have no emails to show for it.

I can directly run the code via Execute Anonymous and have the code consistently execute, so I don't think that's a problem (although given the number of records in the object we're querying, the SOQL query does take close to a minute to run when I submit via EA).

Given the scenario described above, I have the following questions:

  1. Is there any known reason why a class would run inconsistently when scheduled, despite always running when directly invoked?
  2. Would the solution documented in the aforementioned question – aborting the current job, then rescheduling for a later time – actually address the problem I'm having, or would it just cause the Apex Jobs page to be more explicit about what has/has not been done?
  3. Is there any way to add debugging to a scheduled job that is not being run under a specific user profile (or am I mistaken here and it is being run under a profile)?

I should note that I'm an Apex novice, working with code that was completed (but not turned on) before our previous developer departed, so apologies for any obvious misses or boneheaded statements. Thanks in advance for your assistance!

Update 4/9/15: I suspect this may have something to do with Salesforce timing out the query when initiated via a scheduled job. When I run it via the scheduler, I get up to the query and find the following:

14:00:20.936 (936955661)|METHOD_EXIT|[40]|01pF0000002QxZ2|ArchiveService.getObjectArchiveQuery()
14:00:20.936 (936987788)|SYSTEM_METHOD_ENTRY|[40]|Database.query(String)
14:00:20.940 (940831639)|SOQL_EXECUTE_BEGIN|[40]|Aggregations:0|SELECT [a whole bunch of fields] FROM Case WHERE needs_archive__c = true
14:02:24.902 (124902031857)|SYSTEM_MODE_EXIT|false

However, when I run the same code via Execute Anonymous, it continues past this point:

13:51:44.034 (1034339365)|METHOD_EXIT|[40]|01pF0000002QxZ2|ArchiveService.getObjectArchiveQuery()
13:51:44.038 (1038006973)|SOQL_EXECUTE_BEGIN|[40]|Aggregations:0|SELECT [a whole bunch of fields] FROM Case WHERE needs_archive__c = true
13:53:07.150 (84150020438)|SOQL_EXECUTE_END|[40]|Rows:2

(I've omitted the SOQL SELECT for brevity – it's just grabbing a whole bunch of fields)

Best Answer

Things to think about...

  1. Since the scheduled job has to execute under some user, you can set up debug log for that user and monitor for errors.
  2. There is a limit on outbound Apex emails per day; perhaps some other job has exceeded the limit before the job in question has executed
  3. There could be some query timeout or other catchable exception and the code as written doesn't fail gracefully

With the exception of limit exceptions, asynchronous code should use try-catch to ensure that every exception is caught and some diagnostic log is written for later examination. Such log could be an email to the sysad or and/or writing to a custom Log__c SObject a list of interesting things that happened during the course of execution. Dan Appleman's book 'Advanced Apex Programming' has a good debugging framework worth considering/adapting