[SalesForce] Email html body to plain text body

I have an Email Service setup. It parses incoming email, gets all needed values and post them to proper fields in proper objects.
Entire plain text body is saved. There is a chance that there will be no plain text body. So in this case I want to convert html to plain text.
If someone know any open source code, it would be wonderful. External services are ok too, but I prefer to go with some custom logic.

Test class:

@isTest
private class HtmlToPlainTextConverter_UT {

  private static String validHtml = ''
    +'<!DOCTYPE html>'
    +'<html>'
    +'<head>'
      +'<title>Hi there</title>'
    +'</head>'
    +'<body>'
      +'This is a page</br>'
      +'a simple page'
      +'<table>'
      +'<tr>'
      +'<td>Col 1.1</td>'
      +'<td>Col 1.2</td>'
      +'</tr>'
      +'<tr>'
      +'<td>Col 2.1</td>'
      +'<td>Col 2.2</td>'
      +'</tr>'
      +'</table>'
    +'</body>'
    +'</html>'
  +'';

  private static String validConvertedHtml = ''
    +'Hi there\n'
    +'This is a page\n'
    +'a simple page\n'
    +'Col 1.1 Col 1.2 \n'
    +'Col 2.1 Col 2.2'
  +'';

  @isTest(SeeAllData=false)
  private static void validConvertTest() {
    System.assertEquals(validConvertedHtml, HtmlToPlainTextConverter.convert(validHtml));
  }
}

Converter's class:

/**
*
* @description Class contains methods for converting html to plain text
*
*
* @author Andrii Muzychuk
* @date 12/23/2014
*
*/
global class HtmlToPlainTextConverter {

  private static String anyOpenHtmlTag = '<\\W{0,1}\\w+\\s*\\w*>';

  // array is used to store order of patters
  private static String [] patternsApplyOrder = new String [] {
    '</td>',
    '</\\w+>',
    '</[h||H][0-9]{0,1}>',

    '<tr\\s*(valign=".{1,20}")*\\s*>',
    '<td\\s+colspan="[0-9]"\\s*>(\\s*||&nbsp;)\\s*</td>',
    '<td\\s*(style=".{1,40}")*\\s*>(\\s*||&nbsp;)\\s*</td>',

    anyOpenHtmlTag
  };

  private static Map<String, String> convertPatterns = new Map<String, String> {
    '</td>' => ' ',
    '</\\w+>' => '\n', // any closing tag convert to new line
    '</[h||H][0-9]{0,1}>' => '\n',

    '<tr\\s*(valign=".{1,20}")*\\s*>' => '\n',
    '<td\\s+colspan="[0-9]"\\s*>(\\s*||&nbsp;)\\s*</td>' => '\n --- \n',
    '<td\\s*(style=".{1,40}")*\\s*>(\\s*||&nbsp;)\\s*</td>' => '\t',

    anyOpenHtmlTag => ''
  };

  /**
  *
  * @description Method removes html tags or replaces them with line break
  *
  * @param htmlToConvert
  *
  *
  * @usage HtmlToPlainTextConverter.convert(htmlToConvert);
  *
  * @author Andrii Muzychuk
  * @date 12/23/2014
  *
  */
  global static String convert(String htmlToConvert) {
    String plainText = htmlToConvert;

    for (String convertPatternKey : patternsApplyOrder) {
        plainText = plainText.replaceAll(convertPatternKey, convertPatterns.get(convertPatternKey));
    }

    return plainText.trim();
  }

}

Best Answer

Use the String function stripHtmlTags() and it will do the work for you.

String1.stripHtmlTags()
Related Topic