[SalesForce] Dealing with auto-batching of triggers

I've recently been trying to come to terms with the oddities of the auto-"batching" of triggers. What I'm referring to is how when dealing with more than 200 records in a trigger execution, Salesforce will automatically "batch" the records into groups of 200. For example, if I run a trigger on 500 Account records, the logic in the trigger will be done three times in succession, on the first 200 records, then the next 200, then the last 100. Your logic will run 3 times in succession, and governor limits are never reset.

This doesn't pose much of a problem in a small-scale operation, but in working in a Salesforce instance with thousands of users, large quantities of data, and extensive automation, these issues are bound to come up.

My first question: what is the best practice to handle a trigger firstRun variable? Since workflows can often cause a trigger to run twice, it's important to pass through pieces of logic only the first time. As for order of execution, I've determined that the first 200 records will move through the trigger, then workflow, then the trigger again. Then the next "batch" will be processed, and so on. Because of this predictable order, my best approach is this (I simplified the overarching structure to demonstrate the logic):

public class MyObjectServices {
    public static Set<Id> recordsProcessed = new Set<Id>();

    public static void myTriggerMethod(Map<Id, MyObject__c> newMap){
        //The number of records processed in all "batches" to this point
        Integer sizeBefore = recordsProcessed.size();

        //Add the ids all records being processed this "batch". If they've already
        //been processed, the set will prevent duplicates from being added
        for (Id key : newMap.keySet()){
            recordsProcessed.add(key);
        }

        //Determines if the records included in the current "batch" of 200 
        //have been processed before
        if (recordsProcessed.size() != sizeBefore){
            //Trigger logic
        }
    }
}

However, this 1) doesn't work for before insert since the id doesn't yet exist, and 2) feels legitmately hacky. Are there any better ways to do this?

My second question: Is there any way to get the total number of records being processed in your trigger? Trigger.size will only give you the number being processed in that particular batch. We could use the recordsProcessed variable from above, but that will only give you the total number during the last "batch" of the trigger. And there's no way of knowing which batch that will be 100% of the time! Yes, if Trigger.size != 200, you know it's the last batch, but what if you're dealing with an exact multiple of 200?

My third question: finally, how do you handle limits? Suppose you update 10,000 of one type of record. This means 50 separate "batches" within the same execution! So, suppose you follow bulkification practices perfectly, you only have 2 queries (100 total) and 3 DML statements (150 total) to work with per batch before you hit governor limits. This is straight-up unworkable. All ideas I can come up with involve incredibly over-engineered methods of separating records into groups of 200 or less and using Queueables to process them. But this makes it very difficult to do any validation, and of course any implementation would be very complicated, so a dedicated technical architect would need to furiously review all incoming code.

So, do any of you have any insight how to face these problems in doing large-scale trigger automation?

EDIT: Adding an example to clear up some misconceptions about how trigger "batching" affects limits and static variables. Trigger:

trigger FooTrigger on Foo__c (before update) {

    System.debug('Trigger before handler call');
    FooTriggerHandler fth = new FooTriggerHandler(
        trigger.oldMap,
        trigger.newMap,
        trigger.old,
        trigger.new,
        trigger.isInsert,
        trigger.isUpdate,
        trigger.isDelete,
        trigger.isUndelete,
        trigger.isBefore,
        trigger.isAfter,
        trigger.size
    );
    System.debug('Trigger after handler call');
}

Apex class:

public class FooTriggerHandler {

    public static Set<Id> allRecordIds = new Set<Id>();
    public static Boolean firstRun = true;
    public static Integer count = 0;

    public FooTriggerHandler(Map<Id,Foo__c> oldMap, Map<Id,Foo__c> newMap, List<Foo__c> triggerOld, List<Foo__c> triggerNew, 
    Boolean isInsert, Boolean isUpdate, Boolean isDelete, Boolean isUndelete, Boolean isBefore, Boolean isAfter, Integer size){
        System.debug('Trigger.size: ' + size);
        System.debug('Entering dispatcher constructor');
        System.debug('DML before: ' + Limits.getDmlStatements());
        System.debug('DML rows before: ' + Limits.getDmlRows());
        System.debug('Queries before: ' + Limits.getQueries());
        System.debug('Query rows before: ' + Limits.getQueryRows());
        System.debug('Firstrun: ' + firstRun);
        firstRun = false;
        System.debug('Record Ids before: ' + allRecordIds.size());
        List<User> someUsers = [SELECT Id FROM User LIMIT 2];
        for (Foo__c f : triggerNew){
           allRecordIds.add(f.Id);
        }
        System.debug('Record Ids after: ' + allRecordIds.size());
        Contact contact1 = new Contact(LastName = 'TestContact' + count++);
        Contact contact2 = new Contact(LastName = 'TestContact' + count++);
        insert new List<Contact>{contact1, contact2};
        System.debug('DML after: ' + Limits.getDmlStatements());
        System.debug('DML rows after: ' + Limits.getDmlRows());
        System.debug('Queries after: ' + Limits.getQueries());
        System.debug('Query rows after: ' + Limits.getQueryRows());
    }

}

Meanwhile, I have a workflow rule running an update every time I update a Foo__c (in order to demonstrate the trigger re-running).

I run the following anonymous code: update [SELECT Id FROM Foo__c LIMIT 300];

And my debug logs:

//First trigger batch, first run
09:18:00:143 USER_DEBUG [3]|DEBUG|Trigger before handler call
09:18:00:144 USER_DEBUG [9]|DEBUG|Trigger.size: 200
09:18:00:144 USER_DEBUG [10]|DEBUG|Entering dispatcher constructor
09:18:00:144 USER_DEBUG [11]|DEBUG|DML before: 1
09:18:00:144 USER_DEBUG [12]|DEBUG|DML rows before: 300
09:18:00:144 USER_DEBUG [13]|DEBUG|Queries before: 1
09:18:00:144 USER_DEBUG [14]|DEBUG|Query rows before: 300
09:18:00:144 USER_DEBUG [15]|DEBUG|Firstrun: true
09:18:00:144 USER_DEBUG [17]|DEBUG|Record Ids before: 0
09:18:00:391 USER_DEBUG [22]|DEBUG|Record Ids after: 200
09:18:01:161 USER_DEBUG [26]|DEBUG|DML after: 2
09:18:01:161 USER_DEBUG [27]|DEBUG|DML rows after: 302
09:18:01:161 USER_DEBUG [28]|DEBUG|Queries after: 2
09:18:01:161 USER_DEBUG [29]|DEBUG|Query rows after: 302
09:18:01:161 USER_DEBUG [17]|DEBUG|Trigger after handler call
//First trigger batch, after workflow
09:18:01:798 USER_DEBUG [3]|DEBUG|Trigger before handler call
09:18:01:799 USER_DEBUG [9]|DEBUG|Trigger.size: 200
09:18:01:799 USER_DEBUG [10]|DEBUG|Entering dispatcher constructor
09:18:01:799 USER_DEBUG [11]|DEBUG|DML before: 2
09:18:01:799 USER_DEBUG [12]|DEBUG|DML rows before: 302
09:18:01:799 USER_DEBUG [13]|DEBUG|Queries before: 2
09:18:01:799 USER_DEBUG [14]|DEBUG|Query rows before: 302
09:18:01:799 USER_DEBUG [15]|DEBUG|Firstrun: false
09:18:01:799 USER_DEBUG [17]|DEBUG|Record Ids before: 200
09:18:01:890 USER_DEBUG [22]|DEBUG|Record Ids after: 200
09:18:01:952 USER_DEBUG [26]|DEBUG|DML after: 3
09:18:01:952 USER_DEBUG [27]|DEBUG|DML rows after: 304
09:18:01:952 USER_DEBUG [28]|DEBUG|Queries after: 3
09:18:01:952 USER_DEBUG [29]|DEBUG|Query rows after: 304
09:18:01:952 USER_DEBUG [17]|DEBUG|Trigger after handler call
//Second trigger batch, first run
09:18:02:623 USER_DEBUG [3]|DEBUG|Trigger before handler call
09:18:02:623 USER_DEBUG [9]|DEBUG|Trigger.size: 100
09:18:02:623 USER_DEBUG [10]|DEBUG|Entering dispatcher constructor
09:18:02:624 USER_DEBUG [11]|DEBUG|DML before: 3
09:18:02:624 USER_DEBUG [12]|DEBUG|DML rows before: 304
09:18:02:624 USER_DEBUG [13]|DEBUG|Queries before: 3
09:18:02:624 USER_DEBUG [14]|DEBUG|Query rows before: 304
09:18:02:624 USER_DEBUG [15]|DEBUG|Firstrun: false
09:18:02:624 USER_DEBUG [17]|DEBUG|Record Ids before: 200
09:18:02:624 USER_DEBUG [22]|DEBUG|Record Ids after: 300
09:18:02:624 USER_DEBUG [26]|DEBUG|DML after: 4
09:18:02:624 USER_DEBUG [27]|DEBUG|DML rows after: 306
09:18:02:624 USER_DEBUG [28]|DEBUG|Queries after: 4
09:18:02:624 USER_DEBUG [29]|DEBUG|Query rows after: 306
09:18:02:624 USER_DEBUG [17]|DEBUG|Trigger after handler call
//Second trigger batch, after workflow
09:18:03:180 USER_DEBUG [3]|DEBUG|Trigger before handler call
09:18:03:180 USER_DEBUG [9]|DEBUG|Trigger.size: 100
09:18:03:180 USER_DEBUG [10]|DEBUG|Entering dispatcher constructor
09:18:03:180 USER_DEBUG [11]|DEBUG|DML before: 4
09:18:03:180 USER_DEBUG [12]|DEBUG|DML rows before: 306
09:18:03:180 USER_DEBUG [13]|DEBUG|Queries before: 4
09:18:03:180 USER_DEBUG [14]|DEBUG|Query rows before: 306
09:18:03:180 USER_DEBUG [15]|DEBUG|Firstrun: false
09:18:03:180 USER_DEBUG [17]|DEBUG|Record Ids before: 300
09:18:03:511 USER_DEBUG [22]|DEBUG|Record Ids after: 300
09:18:03:603 USER_DEBUG [26]|DEBUG|DML after: 5
09:18:03:604 USER_DEBUG [27]|DEBUG|DML rows after: 308
09:18:03:604 USER_DEBUG [28]|DEBUG|Queries after: 5
09:18:03:604 USER_DEBUG [29]|DEBUG|Query rows after: 308
09:18:03:604 USER_DEBUG [17]|DEBUG|Trigger after handler call

Observations:

  1. Using a static first run variable only affects the first 200 records. The variable remains false when processing any additional records. Only use this when you legitimately want code to run exactly one time, not once with each record.
  2. Limits are not reset between batches, nor between re-runs due to workflow. You could technically query the same 500 records each time, and they would count repeatedly toward your query rows, potentially causing you to hit limits without actually working in bulk.

Best Answer

However, this 1) doesn't work for before insert since the id doesn't yet exist, and 2) feels legitmately hacky. Are there any better ways to do this?

Your approach (Set<Id> recordsProcessed) is a good one. You do not have to prevent before insert trigger recursion, so it's really not an issue that records do not have an Id yet in that case. I don't find it to be "hacky", and this approach is more robust than a simple Boolean flag (which will only operate correctly on the first batch).

Is there any way to get the total number of records being processed in your trigger?

I don't believe so, unless you set it from the calling context. For instance, you could do something like:

public with sharing class LeadService
{
    public static Integer recordsToProcess = 0;

    // service methods
}

/*VVV calling context VVV*/
List<Lead> toUpdate; // = <some_list>
LeadService.recordsToProcess = toUpdate.size();
update recordsToProcess;

However, you might have futher updates to leads with different values. I would avoid this strategy and find other ways around this limitation. It shouldn't matter which batch comes last. If you want to make sure some logic happens after your trigger logic completes, consider asynchronous processing.

how do you handle limits?

Two common strategies for easing limits usage are to:

  • Use asynchronous processing for heavy lifting
  • Use lazy loading to re-use common data

With the former strategy, you can trade queries/dml/cpu for async calls. It can be more difficult to prevent trigger recursion, but you should be able to work around it by careful application of criteria (filters).

The latter can help when you have configuration data you will need in all your batches. It would look something like:

public static List<ConfigObject> configData
{
    get
    {
        if (configData == null)
            configData = [/*query*/];
        return configData;
    }
    private set;
}
Related Topic