[SalesForce] Maximum CPU time limit allowed to be exceeded in Sandbox

I recently deployed some code to a production org after significant bulk testing in my Sandbox. There were no problems so assumed it was ok to promote. After promotion however, certain processes started to malfunction, indicating that the Maximum CPU time was being exceeded.

Confused, I ran this same process with the same data in my sandbox, and it worked, the process completed, records updated and no limits exceeded as before.

However, when I looked at the logs for my operation, I saw the following:

enter image description here

As you can see, that's more than just "close to the limit". My question is, how is this able to run on the sandbox (CS12)? and not on live (NA20)? Is there some kind of sandbox grace period for this limit I'm falling foul of? It's quite annoying as if code works in the sandbox I assume it's going to be fine in production.

Best Answer

The system has the right to terminate any process that run longer than 10,000 ms in execution time, but it does not have the responsibility to do so. This means that during peak loads, your code might crash at 10,001 ms because the system needs to keep resources available for contending processes, but it might happily run to completion during the midnight hours or on weekends.

During non-peak times, you might be allowed to run 15k, 20k or more-- I've never seen an official absolute maximum, although I'm a bit surprised by the 24.7k, and the highest I've seen in production is about 15k. Unlike other governor limits, like DML rows or SOQL queries, the 10k ms limit is more of a suggestion than a hard stop. You are guaranteed a minimum time, 10k ms, so your code won't crash because it was at 9,999 ms. But it may or may not crash at 10,001 ms if the system needs the resources.

You shouldn't assume that just because you reached 24.7k that it would ever run in production. There's a slim chance that, at 2am, with no traffic at all, it might run to completion. It will also probably take far less time than 24.7k, because the system load will be lighter, and production hardware is beefier than sandboxes. I'd imagine that sandbox testing limits is probably close to double production, because the hardware is only half as powerful, and therefore might take twice as long to complete a transaction (confirmation or denial of this would be awesome).

In general, if you're going over 10,000 ms at all in Sandbox, you'll be running running the "close to limit" area in production (between 5k and 9k, most likely), at which point you should probably rightfully be concerned. At 15k in Sandbox, it's probably close to a 75% chance of sporadic failure in production, and anything after 20k in Sandbox should have 100% chance of failing sporadically in production. You need to optimize your code, or offload the heavy tasks to future methods, batches, etc.

Related Topic