[SalesForce] How to correctly read CSV file, BLOB not UTF-8 string

I have a CSV file which I am trying to read using Apex, the file looks like this:

COLUMN1;COLUMN2;COLUMN3;COLUMN4;COLUMN5;COLUMN6;COLUMN7
VALUE1;VALUE2;VALUE3;VALUE4;VALUE5;VALUE6;VALUE7;

The file is stored as an attachment which I query like this and try to read:

Attachment csvFile = [SELECT Body FROM Attachment WHERE ParentId =: myFileId LIMIT 1];
String csvContent = csvFile.Body.toString();

This returns BLOB is not a valid UTF-8 string

I've also tried encoding it like this:

Blob csvBlob = csvFile.Body;
String csvContent = EncodingUtil.base64Encode(csvBlob);

But this brings back random characters in the csvContent variable, likely because its been encoded.

Can anyone advise the correct way to "decode" this?

Best Answer

For the bytes you shared, the text came out as:

STAMP_DATE;CUSNO;FIRSTNAME;LASTNAME;COMPANYNAME;ROWTEXT1;ROWTEXT2;ROWTEXT3;CUSTOMER_TYPE;CUSTOMER_TITLE;CUS

Which means that the problem isn't at the beginning. What's going on is that you've got a stray byte somewhere that isn't UTF-8 compatible, and the upper range of ASCII characters (0x80-0xFF) have a different meaning in Unicode (mostly invalid combinations, like 0x80 0x00 would be invalid).

Without seeing the entire file, we really have no idea what's broken, but if I had to guess, it's most likely a character that has some sort of decoration (e.g. Ñ), which in ASCII are one byte, but in Unicode are two (or more) bytes. When invalid characters appear in a Unicode string, Apex code fails with that exception, as it cannot understand ASCII/ISO-8859-1 encoding except in some special circumstances.

If your file is in ISO-8859-1 format, you'll first need to convert it to be Unicode compliant. Most likely, that means opening it up in a text editor (not Notepad, but a good one, like Programmer's Notepad or Notepad++), change the encoding to Unicode, upload the new file to Salesforce, and try again. Or, you can try to decode it using something like in this answer. Your results may vary.

Related Topic