I have an Email Service setup. It parses incoming email, gets all needed values and post them to proper fields in proper objects.
Entire plain text body is saved. There is a chance that there will be no plain text body. So in this case I want to convert html to plain text.
If someone know any open source code, it would be wonderful. External services are ok too, but I prefer to go with some custom logic.
Test class:
@isTest
private class HtmlToPlainTextConverter_UT {
private static String validHtml = ''
+'<!DOCTYPE html>'
+'<html>'
+'<head>'
+'<title>Hi there</title>'
+'</head>'
+'<body>'
+'This is a page</br>'
+'a simple page'
+'<table>'
+'<tr>'
+'<td>Col 1.1</td>'
+'<td>Col 1.2</td>'
+'</tr>'
+'<tr>'
+'<td>Col 2.1</td>'
+'<td>Col 2.2</td>'
+'</tr>'
+'</table>'
+'</body>'
+'</html>'
+'';
private static String validConvertedHtml = ''
+'Hi there\n'
+'This is a page\n'
+'a simple page\n'
+'Col 1.1 Col 1.2 \n'
+'Col 2.1 Col 2.2'
+'';
@isTest(SeeAllData=false)
private static void validConvertTest() {
System.assertEquals(validConvertedHtml, HtmlToPlainTextConverter.convert(validHtml));
}
}
Converter's class:
/**
*
* @description Class contains methods for converting html to plain text
*
*
* @author Andrii Muzychuk
* @date 12/23/2014
*
*/
global class HtmlToPlainTextConverter {
private static String anyOpenHtmlTag = '<\\W{0,1}\\w+\\s*\\w*>';
// array is used to store order of patters
private static String [] patternsApplyOrder = new String [] {
'</td>',
'</\\w+>',
'</[h||H][0-9]{0,1}>',
'<tr\\s*(valign=".{1,20}")*\\s*>',
'<td\\s+colspan="[0-9]"\\s*>(\\s*|| )\\s*</td>',
'<td\\s*(style=".{1,40}")*\\s*>(\\s*|| )\\s*</td>',
anyOpenHtmlTag
};
private static Map<String, String> convertPatterns = new Map<String, String> {
'</td>' => ' ',
'</\\w+>' => '\n', // any closing tag convert to new line
'</[h||H][0-9]{0,1}>' => '\n',
'<tr\\s*(valign=".{1,20}")*\\s*>' => '\n',
'<td\\s+colspan="[0-9]"\\s*>(\\s*|| )\\s*</td>' => '\n --- \n',
'<td\\s*(style=".{1,40}")*\\s*>(\\s*|| )\\s*</td>' => '\t',
anyOpenHtmlTag => ''
};
/**
*
* @description Method removes html tags or replaces them with line break
*
* @param htmlToConvert
*
*
* @usage HtmlToPlainTextConverter.convert(htmlToConvert);
*
* @author Andrii Muzychuk
* @date 12/23/2014
*
*/
global static String convert(String htmlToConvert) {
String plainText = htmlToConvert;
for (String convertPatternKey : patternsApplyOrder) {
plainText = plainText.replaceAll(convertPatternKey, convertPatterns.get(convertPatternKey));
}
return plainText.trim();
}
}
Best Answer
Use the String function
stripHtmlTags()
and it will do the work for you.