[SalesForce] Exporting Data to CSV file via Bulk API

I am having trouble with my Bulk API code. What I want is to take the records in the Opportunity object, and a few of their fields, and export them to a csv file for reading. The code is currently as follows.

public class BulkExample {

private BulkConnection bulkConnection = null;

public static void main(String[] args)
  throws AsyncApiException, ConnectionException, IOException {
    BulkExample example = new BulkExample();
    example.doBulkQuery();
}

public boolean login() {
    boolean success = false;

    String userId = "VALID LOGIN";
    String passwd = "VALID PASSWORD";
    String soapAuthEndPoint = "https://test.salesforce.com/services/Soap/u/29.0";
    String bulkAuthEndPoint = "https://cs20-api.salesforce.com/services/async/29.0";
    try {
      ConnectorConfig config = new ConnectorConfig();
      config.setUsername(userId);
      config.setPassword(passwd);
      config.setAuthEndpoint(soapAuthEndPoint);
      config.setCompression(true);
      config.setTraceFile("traceLogs.txt");
      config.setTraceMessage(true);
      config.setPrettyPrintXml(true);
      config.setRestEndpoint(bulkAuthEndPoint);
      System.out.println("AuthEndpoint: " + 
          config.getRestEndpoint());
      PartnerConnection connection = new PartnerConnection(config);
      System.out.println("SessionID: " + config.getSessionId());
      bulkConnection = new BulkConnection(config);
      success = true;
    } catch (AsyncApiException aae) {
      aae.printStackTrace();
    } catch (ConnectionException ce) {
      ce.printStackTrace();
    } catch (FileNotFoundException fnfe) {
      fnfe.printStackTrace();
    }
    return success;
  }

  public void doBulkQuery(){
    if ( ! login() ) {
    }
    try {
      JobInfo job = new JobInfo();
      job.setObject("Opportunity");

      job.setOperation(OperationEnum.query);
      job.setConcurrencyMode(ConcurrencyMode.Parallel);
      job.setContentType(ContentType.CSV
              );

      job = bulkConnection.createJob(job);
      assert job.getId() != null;

      job = bulkConnection.getJobStatus(job.getId());

      String query = 
          "SELECT Name, Id FROM Opportunity";

      long start = System.currentTimeMillis();

      BatchInfo info = null;
      ByteArrayInputStream bout = 
          new ByteArrayInputStream(query.getBytes());
      info = bulkConnection.createBatchFromStream(job, bout);

      String[] queryResults = null;

      for(int i=0; i<10000; i++) {
        Thread.sleep(i==0 ? 30 * 1000 : 30 * 1000); //30 sec
        info = bulkConnection.getBatchInfo(job.getId(), 
            info.getId());

        if (info.getState() == BatchStateEnum.Completed) {
          QueryResultList list = 
              bulkConnection.getQueryResultList(job.getId(), 
                  info.getId());
          queryResults = list.getResult();
          break;
        } else if (info.getState() == BatchStateEnum.Failed) {
          System.out.println("-------------- failed ----------" 
              + info);
          break;
        } else {
          System.out.println("-------------- waiting ----------" 
              + info);
        }
      }

      if (queryResults != null) {
        for (String resultId : queryResults) {
          bulkConnection.getQueryResultStream(job.getId(), 
              info.getId(), resultId);
        }
        System.out.println("Output complete, please check output file.");
      }
    } catch (AsyncApiException aae) {
      aae.printStackTrace();
    } catch (InterruptedException ie) {
      ie.printStackTrace();
    }
  }
}

It logs in correctly to the sandbox as it should, and it pulls the data as well but when it does it pulls a whole bunch of extra stuff with it. In traceLogs.csv:

WSC: Creating a new connection to https://test.salesforce.com/services/Soap/u/29.0 Proxy 

= DIRECT username null  
------------ Request start   ---------- 
<?xml version="1.0" encoding="UTF-8"?><env:Envelope 
   xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"   
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">   
 <env:Body> 
  <m:login xmlns:m="urn:partner.soap.sforce.com" xmlns:sobj="urn:sobject.partner.soap.sforce.com">  
   <m:username>VALID LOGIN</m:username> 
   <m:password>VALID PASSWORD</m:password>  
  </m:login>    
 </env:Body>    
</env:Envelope> 
------------ Request end   ----------   
null=[HTTP/1.1 200 OK]  
Date=[Mon    13 Jan 2014 14:16:24 GMT]
Content-Length=[798]    
Content-Encoding=[gzip] 
Content-Type=[text/xml;charset=UTF-8]   
------------ Response start ----------  
  <?xml version="1.0" encoding="UTF-8"?>    
    <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns="urn:partner.soap.sforce.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  
      <soapenv:Body>    
        <loginResponse> 
          <result>  
            <metadataServerUrl>https://cs20.salesforce.com/services/Soap/m/29.0/00Dm00000000VYv</metadataServerUrl> 
            <passwordExpired>false</passwordExpired>    
            <sandbox>true</sandbox> 
            <serverUrl>https://cs20.salesforce.com/services/Soap/u/29.0/00Dm00000000VYv</serverUrl> 
            <sessionId>00Dm00000000VYv!AQkAQKplEY1Dg7JtE6Dl45I3J4Nofn7MqPYsOuaHLXjqCpSUqFnSFm.DU_7l1w8RZE_YSXXDQmY96YwVxRHtyRzCE7m_4OvI</sessionId> 
            <userId>00560000002990JAAQ</userId> 
            <userInfo>  
              <accessibilityMode>false</accessibilityMode>  
              <currencySymbol>$</currencySymbol>    
              <orgAttachmentFileSizeLimit>5242880</orgAttachmentFileSizeLimit>  
              <orgDefaultCurrencyIsoCode>USD</orgDefaultCurrencyIsoCode>    
              <orgDisallowHtmlAttachments>false</orgDisallowHtmlAttachments>    
              <orgHasPersonAccounts>false</orgHasPersonAccounts>    
              <organizationId>00Dm00000000VYvEAM</organizationId>   
              <organizationMultiCurrency>false</organizationMultiCurrency>  
              <organizationName>Lumos Networks</organizationName>   
              <profileId>00e6000000156VFAAY</profileId> 
              <roleId xsi:nil="true"/>  
                <sessionSecondsValid>14400</sessionSecondsValid>    
                <userDefaultCurrencyIsoCode xsi:nil="true"/>    
                  <userEmail>millerjm@lumosnet.com</userEmail>  
                  <userFullName>John M Miller</userFullName>    
                  <userId>00560000002990JAAQ</userId>   
                  <userLanguage>en_US</userLanguage>    
                  <userLocale>en_US</userLocale>    
                  <userName>millerjm@lumosnet.com.jmillerbox</userName> 
                  <userTimeZone>America/New_York</userTimeZone> 
                  <userType>Standard</userType> 
                  <userUiSkin>Theme3</userUiSkin>   
              </userInfo>   
            </result>   
          </loginResponse>  
        </soapenv:Body> 
      </soapenv:Envelope>   
------------ Response end   ----------  
WSC: Creating a new connection to https://cs20-api.salesforce.com/services/async/29.0/job/ Proxy = DIRECT username null 
------------ Request start   ---------- 
<?xml version="1.0" encoding="UTF-8"?><jobInfo  
   xmlns="http://www.force.com/2009/06/asyncapi/dataload">  
 <operation>query</operation>   
 <object>Opportunity</object>   
 <concurrencyMode>Parallel</concurrencyMode>    
 <contentType>CSV</contentType> 
</jobInfo>  
------------ Request end   ----------   
null=[HTTP/1.1 201 Created] 
Transfer-Encoding=[chunked] 
Date=[Mon    13 Jan 2014 14:16:24 GMT]
Content-Encoding=[gzip] 
Location=[/services/async/29.0/job/750m000000003aAAAQ]  
Content-Type=[application/xml]  
------------ Response start ----------  
  <?xml version="1.0" encoding="UTF-8"?>    
    <jobInfo    
   xmlns="http://www.force.com/2009/06/asyncapi/dataload">  
 <id>750m000000003aAAAQ</id>    
 <operation>query</operation>   
 <object>Opportunity</object>   
 <createdById>00560000002990JAAQ</createdById>  
 <createdDate>2014-01-13T14:16:24.000Z</createdDate>    
 <systemModstamp>2014-01-13T14:16:24.000Z</systemModstamp>  
 <state>Open</state>    
 <concurrencyMode>Parallel</concurrencyMode>    
 <contentType>CSV</contentType> 
 <numberBatchesQueued>0</numberBatchesQueued>   
 <numberBatchesInProgress>0</numberBatchesInProgress>   
 <numberBatchesCompleted>0</numberBatchesCompleted> 
 <numberBatchesFailed>0</numberBatchesFailed>   
 <numberBatchesTotal>0</numberBatchesTotal> 
 <numberRecordsProcessed>0</numberRecordsProcessed> 
 <numberRetries>0</numberRetries>   
 <apiVersion>29.0</apiVersion>  
 <numberRecordsFailed>0</numberRecordsFailed>   
 <totalProcessingTime>0</totalProcessingTime>   
 <apiActiveProcessingTime>0</apiActiveProcessingTime>   
 <apexProcessingTime>0</apexProcessingTime> 
</jobInfo>  
------------ Response end   ----------  
WSC: Creating a new connection to https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ Proxy = DIRECT username null   
https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ  
null: HTTP/1.1 200 OK   
Transfer-Encoding: chunked  
Date: Mon    13 Jan 2014 14:16:25 GMT
Content-Encoding: gzip  
Content-Type: application/xml   
------------ Response start ----------  
  <?xml version="1.0" encoding="UTF-8"?>    
    <jobInfo    
   xmlns="http://www.force.com/2009/06/asyncapi/dataload">  
 <id>750m000000003aAAAQ</id>    
 <operation>query</operation>   
 <object>Opportunity</object>   
 <createdById>00560000002990JAAQ</createdById>  
 <createdDate>2014-01-13T14:16:24.000Z</createdDate>    
 <systemModstamp>2014-01-13T14:16:24.000Z</systemModstamp>  
 <state>Open</state>    
 <concurrencyMode>Parallel</concurrencyMode>    
 <contentType>CSV</contentType> 
 <numberBatchesQueued>0</numberBatchesQueued>   
 <numberBatchesInProgress>0</numberBatchesInProgress>   
 <numberBatchesCompleted>0</numberBatchesCompleted> 
 <numberBatchesFailed>0</numberBatchesFailed>   
 <numberBatchesTotal>0</numberBatchesTotal> 
 <numberRecordsProcessed>0</numberRecordsProcessed> 
 <numberRetries>0</numberRetries>   
 <apiVersion>29.0</apiVersion>  
 <numberRecordsFailed>0</numberRecordsFailed>   
 <totalProcessingTime>0</totalProcessingTime>   
 <apiActiveProcessingTime>0</apiActiveProcessingTime>   
 <apexProcessingTime>0</apexProcessingTime> 
</jobInfo>  
------------ Response end   ----------  
WSC: Creating a new connection to https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ/batch Proxy = DIRECT username null 
------------ Request start   ---------- 
SELECT Name  Id FROM Opportunity
------------ Request end   ----------   
null=[HTTP/1.1 201 Created] 
Transfer-Encoding=[chunked] 
Date=[Mon    13 Jan 2014 14:16:25 GMT]
Content-Encoding=[gzip] 
Location=[/services/async/29.0/job/750m000000003aAAAQ/batch/751m000000005vyAAA] 
Content-Type=[application/xml]  
------------ Response start ----------  
  <?xml version="1.0" encoding="UTF-8"?>    
    <batchInfo  
   xmlns="http://www.force.com/2009/06/asyncapi/dataload">  
 <id>751m000000005vyAAA</id>    
 <jobId>750m000000003aAAAQ</jobId>  
 <state>Queued</state>  
 <createdDate>2014-01-13T14:16:25.000Z</createdDate>    
 <systemModstamp>2014-01-13T14:16:25.000Z</systemModstamp>  
 <numberRecordsProcessed>0</numberRecordsProcessed> 
 <numberRecordsFailed>0</numberRecordsFailed>   
 <totalProcessingTime>0</totalProcessingTime>   
 <apiActiveProcessingTime>0</apiActiveProcessingTime>   
 <apexProcessingTime>0</apexProcessingTime> 
</batchInfo>    
------------ Response end   ----------  
WSC: Creating a new connection to https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ/batch/751m000000005vyAAA Proxy = DIRECT username null  
https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ/batch/751m000000005vyAAA 
null: HTTP/1.1 200 OK   
Transfer-Encoding: chunked  
Date: Mon    13 Jan 2014 14:16:55 GMT
Content-Encoding: gzip  
Content-Type: application/xml   
------------ Response start ----------  
  <?xml version="1.0" encoding="UTF-8"?>    
    <batchInfo  
   xmlns="http://www.force.com/2009/06/asyncapi/dataload">  
 <id>751m000000005vyAAA</id>    
 <jobId>750m000000003aAAAQ</jobId>  
 <state>Completed</state>   
 <createdDate>2014-01-13T14:16:25.000Z</createdDate>    
 <systemModstamp>2014-01-13T14:16:26.000Z</systemModstamp>  
 <numberRecordsProcessed>1</numberRecordsProcessed> 
 <numberRecordsFailed>0</numberRecordsFailed>   
 <totalProcessingTime>0</totalProcessingTime>   
 <apiActiveProcessingTime>0</apiActiveProcessingTime>   
 <apexProcessingTime>0</apexProcessingTime> 
</batchInfo>    
------------ Response end   ----------  
WSC: Creating a new connection to https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ/batch/751m000000005vyAAA/result Proxy = DIRECT username null   
https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ/batch/751m000000005vyAAA/result  
null: HTTP/1.1 200 OK   
Date: Mon    13 Jan 2014 14:16:55 GMT
Content-Length: 111 
Content-Encoding: gzip  
Content-Type: application/xml   
------------ Response start ----------  
  <result-list xmlns="http://www.force.com/2009/06/asyncapi/dataload">  
    <result>752m00000004CHq</result>    
</result-list>  
------------ Response end   ----------  
WSC: Creating a new connection to https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ/batch/751m000000005vyAAA/result/752m00000004CHq Proxy = DIRECT username null   
https://cs20-api.salesforce.com/services/async/29.0/job/750m000000003aAAAQ/batch/751m000000005vyAAA/result/752m00000004CHq  
null: HTTP/1.1 200 OK   
Transfer-Encoding: chunked  
Date: Mon    13 Jan 2014 14:16:55 GMT
Content-Encoding: gzip  
Content-Type: text/csv; charset=UTF-8   
------------ Response start ----------  
Name    Id
Test fields Testville O-013702  006m0000001rLDdAAM

------------ Response end   ----------  

Obviously a lot of this is useless. All I want is the very last bit. Any help would be very much appreciated. I have been through all documentation I can find on this including all of the official Salesforce Bulk API guides. If there is any clarification I can give, please feel free to ask.

EDIT: Thanks to the best solution, I was able to make a finished polished product. Below are links to a fully commented and documented source code for you to download and compile yourself. All you need to do is substitute your own login and org information. Make sure when you try to compile this, you have the input.txt file in the same directory so your IDE can see it.

BulkExample.java: http://pastebin.com/ZfsMcSe6

FileHandler.java: http://pastebin.com/PQ8V0hky

QueryHandler.java: http://pastebin.com/69gKzm3k

InputHandler.java: http://pastebin.com/99uAegiY

Sample Input.txt: http://pastebin.com/ZYCnQte9

Please feel free to use this code for your projects or distribute it elsewhere. All I ask is that you leave the credit in the header of BulkExample.java

Thank you and good luck!

Best Answer

It looks like that's the debug trace file, traceLogs.txt. Unfortunately, the sample isn't very clear - bulkConnection.getQueryResultStream() returns the actual data as an input stream. You need to extend your code to

  • Create a CSV file
  • Get an output stream for the CSV file
  • Read data from the input stream returned by bulkConnection.getQueryResultStream() and write it to the output stream.

Once you have it all working, you will probably want to suppress creation of the trace file by changing

config.setTraceMessage(true);

to

config.setTraceMessage(false);