[SalesForce] Batch – Sending data from one batch to another

I having a little issue to understand the concept of passing params via batches.

I have two batches which I would like to send data between them.

Batch A creates a Map<sObject,Set<String>> and in the execute method, I put data in it.

After that, I would like to send this Map to batch B.

  1. Should I implement Database.Stateful at this case? I know Database.Stateful used for passing data between executing method of the same batch. for example, counting or sum a field.

Of course if the batch will be split to more than one execute I will need Database.Stateful for adding the values of the different executes to the Map, but do I need Database.Stateful for passing data through different batches?

  1. What it means: That Database.Stateful will drastically decrease its performance? How it manifests itself? Only the time of execution?

  2. Would be better to save the data in a custom object and in the second batch query the data and use it? (of course it individual, I'm talking when I need to pass a Map of Object with Set ).

  3. Assuming that the second Batch (Batch_B) that executed in the finish method is stuck in the queue.
    How the data (Map) is saved for this batch? where it saves the data? How can I know that the Data will not be deleted or override by other batch processes? How many Data can I send and Save?

  4. Other Best Practice and Ideas will be much appreciated!

Thanks and sorry for this many questions.

A simple code is posted down below.

BATCH A : ————————————-

global class Batch_A implements Database.Batchable<sObject>, Database.Stateful {
    Map<Sobject,Set<String>> myMap= new Map<Sobject,Set<String>>();
    String query;

    global Batch_A() {

    }

    global Database.QueryLocator start(Database.BatchableContext BC) {
        query = 'SOME QUERY'

        return Database.getQueryLocator(query);
    }

    global void execute(Database.BatchableContext BC, List<Sobject> scope) {   
     // After a loop throtgh scope and its related : 
          myMap.put(Sobject,set<String>);

         }

    global void finish(Database.BatchableContext BC) {
        Id batchJobId = Database.executeBatch(new Batch_B(this.myMap), 200);
    }

}

BATCH B : ————————————-

global class Batch_B implements Database.Batchable<sObject> {
    Map<Sobject,Set<String>> myMap = new Map<Sobject,Set<String>>();
    String query;

    global Batch_B (Map<Sobject,Set<String>> mapValues) {
        //Check the values that sent from Batch_A
        System.debug('mapValues In Constructor -> ' + mapValues);
    }

    global Database.QueryLocator start(Database.BatchableContext BC) {
         //Some Start logic
        return Database.getQueryLocator(query);
    }

    global void execute(Database.BatchableContext BC, List<sObject> scope) {
       //Some Execute logic
    }

    global void finish(Database.BatchableContext BC) {
         //Some Finish logic
    }

} 

Best Answer

Should I implement Database.Stateful at this case? I know Database.Stateful used for passing data between executing method of the same batch. for example, counting or sum a field. Of course if the batch will be split to more than one execute I will need Database.Stateful for adding the values of the different executes to the Map, but do I need Database.Stateful for passing data through different batches?

YES - the finish() method executes in its own transaction so in order to pass the map to the next batch object, you need Database.Stateful

What it means: That Database.Stateful will drastically decrease its performance? How it manifests itself? Only the time of execution?

The biggest issue you will have is heap size as if the map is very large, you will exceed heap

Would be better to save the data in a custom object and in the second batch query the data and use it? (of course it individual, I'm talking when I need to pass a Map of Object with Set ).

This is clearly the most extensible and would allow for millions of rows. You would of course need a predictable query to find the values created by the first batch as passing millions of ids from batch 1 to batch 2 will not work

Assuming that the second Batch (Batch_B) that executed in the finish method is stuck in the queue. How the data (Map) is saved for this batch? where it saves the data? How can I know that the Data will not be deleted or override by other batch processes? How many Data can I send and Save?

The object for the second batch is serialized by SFDC with its constructor arguments. Nothing will be lost

Related Topic