[SalesForce] Unable to read ANSI file in apex code

I need to upload a file with ANSI character encoding from a VisualForce page and should be able to read the file and perform actions from Apex class(controller). Now when I upload an ANSI file, a StringException occurs:

BLOB is not a valid UTF-8 string.

VF page:

<apex:page Controller="FileParser">
     <apex:include pageName="clcommon__mintTheme"/>
      <apex:sectionHeader title="Upload File" />
        <apex:form >
         <apex:pageBlock >
          <apex:pageBlockButtons location="top">
           <apex:pageMessages escape="false"/>
            <apex:pageBlockSection columns="1">
             <apex:inputFile value="{!bFile}"/>
             <apex:commandButton value="Upload" action="{!processReturnFile}"/>
                </apex:pageBlockSection>
                </apex:pageBlockButtons>
            </apex:pageBlock>
        </apex:form>
</apex:page>

Controller:

global with sharing class FileParser{
    public Blob bFile {get; set; }

    public PageReference processReturnFile(){
        String strFile = bFile.toString();
        System.debug(strFile );
    }
}

I am getting this StringException while trying to do

String strFile = bFile.toString();

To create an ANSI file,

  1. Use characters like SKårebj ø and save file
  2. Use online converters like this one to convert it to ANSI or 8859_1

I need the Controller(Apex class) to be able to read the text of this file. (System.debug(strFile ); in code)

Best Answer

I hacked a Apex Charset encoder/decoder util, check gist Charset.cls

/**
 * Convenience method that decodes bytes in charset into a string of Unicode
 * characters.
 * <p>
 * @param  input binary characters in charset
 * @param  charset name according to http://www.iana.org/assignments/character-sets/character-sets.xhtml
 * @return string of Unicode characters
 */
public static String decode(final Blob input, final String charset){
    final String hex = EncodingUtil.convertToHex(input);
    final Integer size = hex.length() >> 1;
    final List<String> bytes = new String[size];

    for (Integer i = 0; i < size; ++i) {
        bytes.set(i, hex.mid(i << 1, 2));
    }
    return EncodingUtil.urlDecode('%' + String.join(bytes, '%'), charset);
}

Typically good for file/blob sizes < 200 kB depending on current heap/regexp limits.

Add Charset.cls and change your controller to use Charset.decode with a valid charset name, i.e ISO-8859-1.

global with sharing class FileParser {
    public Blob bFile {get; set;}

    public PageReference processReturnFile() {
        String strFile = Charset.decode(bFile, 'ISO-8859-1');
        System.debug(strFile);
    }
}
Related Topic