[SalesForce] Platform Event Apex trigger limits exception – where does it go

If one publishes a Platform Event, the consuming trigger executes as Automated Process user. If that consuming trigger exceeds CPU limits, no notification is sent to the developer(s).

Even though Apex Exception Email is configured to route to me

Even though my user is configured to get Apex Warning Emails

How does one discover that there was a limits exception so action can be taken?

Platform Event Foo__e

Apex Trigger

trigger FooTrigger on Foo__e (after insert) {
    Util.sleep(30); // method that does a CPU sleep of ~arg secs 
}

DeveloperConsole

EventBus.publish(new Foo__e());

I have no idea where to find the limits exception email or other indication of failure; querying EventBusSubscriber yields no information re: errors of this sort.

Support ticket filed

UPDATE:

Lesson to us all: Since a Limits failure fails silently when running
user = Automated Process, and since your platform event trigger is
bulkified; if one event in your batch is, say, exceptionally CPU or
heap intensive, the other batch members will fail (silently) as
"collateral damage" and be undetected until you do a database audit.
Adrian Larson's answer is a practical design way out of this.

Ask me how I know 🙁

Best Answer

My plan to work around this limitation would be roughly as below.

Configuration

  1. Use a logging object to track each Platform Event you process in your trigger.
    • Make sure this logging object has no required fields nor validation rules.
  2. Add a Text (18) field to track Job_Id__c.
    • You can later use this field to query AsyncApexJob.
  3. Add a Text field to track the Job_Status__c.
  4. Add a TextArea field to track the Job_Error_Message__c.

Code Changes

  1. Move your core logic to a Queueable.
  2. From your subscriber trigger, fire this async job so any steps which can fail will take place in a separate transaction.
  3. From your subscriber trigger, insert a record into your log object.
  4. On your log record, set the Job_Id__c field.
  5. Set up a scheduled batch which iterates over any log records whose Job_Status__c is in (null, 'Holding', 'Preparing', 'Processing').
    • Match each Job_Id__c up to the corresponding AsyncApexJob.
    • Map Status to Job_Status__c.
    • Map ExtendedStatus to Job_Error_Message__c.