[SalesForce] Creation of Batches from CSV resulting in erroneous state

I'm trying to make batches from a CSV file written using CSVWriter of opencsv as:
CSVWriter writer = new CSVWriter(new FileWriter(filePath+createFileName), ',', CSVWriter.DEFAULT_QUOTE_CHARACTER);

And BufferedReader to read the written file. The Csv file is written and I think read operation also goes well. So, far its working good. But when I chose particular data to be written to Csv using the same operations, creation of batches comes under error out of it.
An Exception is coming stating "Failed to parse CSV. Found unescaped quote. A value with quote should be within a quote" which is making the Application to not behave in a manner expected.

After going through this error it seems there's some ""(double quote) or "(double quote) symbol present in the data. (I 've the data in form of "asdf","1.0","",,"def").
As far as my understanding I tried to apply Regex to find double quotes but couldn't find any, as after examining the file it doesn't contain the repeated double quotes. The link I followed is:https://stackoverflow.com/questions/3180842/regular-expression-to-find-and-replace-unescaped-non-successive-double-quotes-in

Thereafter in the code, I'm making use of: File tmpFile = File.createTempFile("bulkAPIInsert", ".csv"); to hold the data in a temporary file and then deleting it.

After replacing the above code with the following I somehow handled the coming exception but it futher lead to another one stating "Failed to parse CSV. EOF reached before closing an opened quote".
File tmpFile = new File("bulkAPIInsert.csv");

I don't think the above workaround should be followed as it would be performance issues with the application.

By going through the CSVReader class I found a custom exception defined stating exactly the same Exception as I got. But I think it comes when a double quote is found within some double qoute (the cell value of CSV File). I referred the link as: https://github.com/mulesoft/salesforce-connector/blob/master/src/main/java/com/sforce/async/CSVReader.java

Can anybody suggest me where I'm doing wrong or any workaround for this Problem?

I'm sharing you the code snippet as:
Method1 then Method2 is called.

    Method1: private List<BatchInfo> createBatchesFromCSVFile(RestConnection connection,
            JobInfo jobInfo, String csvFileName) throws Exception {
        List<BatchInfo> batchInfos = new ArrayList<BatchInfo>();
        BufferedReader rdr = new BufferedReader(new InputStreamReader(
                new FileInputStream(csvFileName)));

        // read the CSV header row
        String hdr = rdr.readLine();
        byte[] headerBytes = (hdr + "\n").getBytes("UTF-8");
        int headerBytesLength = headerBytes.length;
//      I was making use of the following code which I replaced with the next line of code.
//      File tmpFile = File.createTempFile("bulkAPIInsert", ".csv");
        File tmpFile = new File("bulkAPIInsert.csv");
        // Split the CSV file into multiple batches
        try {
            FileOutputStream tmpOut = new FileOutputStream(tmpFile);
            int maxBytesPerBatch = 10000000; // 10 million bytes per batch
            int maxRowsPerBatch = 10000; // 10 thousand rows per batch
            int currentBytes = 0;
            int currentLines = 0;
            String nextLine;

            while ((nextLine = rdr.readLine()) != null) {
                byte[] bytes = (nextLine + "\n").getBytes("UTF-8"); //TODO
                if (currentBytes + bytes.length > maxBytesPerBatch
                        || currentLines > maxRowsPerBatch) {
                    createBatch(tmpOut, tmpFile, batchInfos, connection, jobInfo);
                    currentBytes = 0;
                    currentLines = 0;
                }
                if (currentBytes == 0) {
                    tmpOut = new FileOutputStream(tmpFile);
                    tmpOut.write(headerBytes);
                    currentBytes = headerBytesLength;
                    currentLines = 1;
                }
                tmpOut.write(bytes);
                currentBytes += bytes.length;
                currentLines++;
            }

            if (currentLines > 1) {
                createBatch(tmpOut, tmpFile, batchInfos, connection, jobInfo);
            }
        } finally {
            if(!tmpFile.delete())
                tmpFile.deleteOnExit();
            rdr.close();
        }
        return batchInfos;
    }

/**
     * Wait for a job to complete by polling the Bulk API.
     */
    Method2: private void awaitCompletion(RestConnection connection, JobInfo job,
            List<BatchInfo> batchInfoList) throws AsyncApiException { 
        try{
            /****
            Some code
            **/
                BatchInfo[] statusList = connection.getBatchInfoList(job.getId())
                .getBatchInfo();
                for (BatchInfo b : statusList) {
                    if (b.getState() == BatchStateEnum.Completed) {
                        if (incomplete.remove(b.getId())) 
                            //Do Something
                    }
                    else if(b.getState() == BatchStateEnum.Failed){ 

                        System.out.println("Reason: "+b.getStateMessage()+".\n  " +
                                "Number of Records Processed: "+b.getNumberRecordsProcessed());
                        throw (new Exception(""));
                    }
                }
            }
        }catch(Exception ex){log.debug(" Exception occurred.");}
    }

The getStateMessage() method of BatchInfo gives the discussed error messages.

Best Answer

Did you tried Marty Y. Chang CSVReader? I have also a process that reads a csv file using a batch apex process and I didn't get any issue with this code. Just take into account:
1. The scope argument in the execute method should be a List<String> where each String would be a whole row of your file. With this data you can call easily
List<list> csvLines = CSVReader.readCSVFile(csvFile,m_parser);
2. To get this result, you would also need to call a method that returns the List that should be in a new class that implements Iterator and Iterable.

Related Topic