[SalesForce] Convert HTML to PDF in APEX

I am trying to convert an HTML email that is sent to an Email Service into a PDF file. Some of the options I have explored:

attachment.fileName = fileName + '.pdf';
attachment.mimeTypeSubType = 'application/pdf';
attachment.body = Blob.toPdf(attachmentBody);

This works great with text and some css, but when I send a rich HTML email with images and lots of styling it fails and generates an error – System.InvalidParameterValueException: An error occurred while parsing the input string.

The second option, I tried to pass in the HTML from the email to a Visualforce page where it was rendered as a PDF and used as an attachment, but there are limitations in the Email Service that you can't use visualforce pages and the HTML markup from the email raised validation exceptions where HTML tags were not closed properly.

I know there are lots of paid HTMLtoPDF services out there on the web with APIs that you can call, but I'd like to explore any alternate Salesforce options before resorting to this or building my own HTMLtoPDF web app.

Cheers

Best Answer

I would parse the blob into a string and strip certain kinds of html tags for the following reasons :

  1. PDF rendering doesn’t support images encoded in the data: URI scheme format
  2. Don't use components that rely on JavaScript
  3. Don't use components that depend on Salesforce stylesheets
  4. tags that rely on JS or an external URI are going to confuse the PDF generator
  5. a tag that refers to an external stylesheet that's not a static resource are also going to be problematic.

I suggest you look at Best Practices for Rendering PDF's in the APEX documentation and much of my reasoning will become readily apparent. Creating PDF's from known input data can be tricky enough as it is. Rendering them from other sources which you have no control over would be totally hit or miss without first stripping extraneous tags or links to embedded external content.

Related Topic