[SalesForce] How to render pdf attachment in to html

I have requirement to get all attachments of a Lead which are generally in pdfs, and merge them in to one single pdf document.

I have tried below code.

str='';
List<Attachment> atts=[select id, body from Attachment limit where parentid=:leadid];
for(Attachment att:atts)
  str+=EncodingUtil.base64Encode(att.body);

I am using str to view it in vf page. But it did not worked.

Any work-around/suggestions will greatly appreciated.

Best Answer

You can't just combine pieces of binary data together and expect it to work. This approach fails on several accounts:

Combining Base64

Base64 uses a three-to-four byte conversion method to make the bits ASCII-safe. As a side effect, only files that are exactly a multiple of three can be stitched together in Base64 without heavy processing. This is because Base64 is padded to make the output stream exactly a multiple of four bytes by appending "==" or "=" to the output stream. This also serves as an "end of stream" marker, meaning that given two base64 strings:

9jfl4eiajf9aealoicg==
9AOdcjj34Lj932kmca+8=

(NOTE: These are random characters that happen to be Base64 characters of the appropriate length, with padding, and the last 4 bytes adjusted to fit proper terminators), simply adding them together results in:

9jfl4eiajf9aealoicg==9AOdcjj34Lj932kmca+8=

But "=" is only valid at the end of a base64 stream, so it would fail.

Furthermore, removing the "==" in the middle:

9jfl4eiajf9aealoicg9AOdcjj34Lj932kmca+8=

Also won't work because "cg==9AOd" has different semantics than "cg9AO".

9AOd
11110100 00000011 10000011

cg9A
01110010 00001111 01000000

In case it's not obvious, you can see that the first byte of 9AOd is 11110100, while 9A as the third and fourth base64 codes changes into 00001111; the extra padding left over from cg== shifts the remaining stream by four bits, so the file becomes invalid.

Binary Formats

But, even if the files were base64 byte aligned, binary files have a stringent format-- you can't directly merge them together and hope that it would work. The parser would see the extra data at the end as an invalid file and not render the data. This is doubly true if you don't know the types of the attachments nor their order. For example, a PDF with a GIF stuck on to the end of it would result in an unreadable file by all modern software. A GIF, JPEG, and PDF all strung together in binary format would possibly allow the GIF to be rendered by some software, but the remaining data would be seen as extra data and discarded.

Text Formats

Even things like XML and HTML have specific rules. Even though they're textual, you can't just paste them together and expect a web browser to render them correctly (but it might, depending on the browser). They have to be merged together in a specific pattern in order for them to be considered syntactically valid. Also, you can't mix plain text with binary, because the parser would simply render the binary as plain text, appearing as garbage.

Merging Data

So, given this information, it would seem hopeless to try anything of the nature. However, all is not lost. There are services out there that can safely stitch together limit types of data, such as extracting pages from a PDF (@SF_Ninja mentions one solution for this). However, you would still be limited to whatever the service can support-- you wouldn't necessarily be able to stitch together a PDF, Microsoft Word document, and three image files into a single PDF. A dedicated programmer could also write a tool to combine various types of data.

However, you can't expect this to work in Apex Code. Given the limited amount of CPU time given, simply decoding and parsing most data formats would easily burn up the governor limits. External processing is key to a successful merge. This means that you have to identify the types of files you want to merge and how they would be organized, then write a processor script (such as the one mentioned previously), then create an integration between the two. In fair likelihood, such a script probably wouldn't handle more than a handful of file types, since most types have no meaning when combined together (e.g. you could combine a MIDI and MP3, or an MP3 and a set of images-- a movie, but how would you combine a PDF and a MIDI file?)

Conclusion

Hopefully, this answer is enlightening enough for one to realize that a narrow scope of functionality has to be defined in order to have any chance of successfully working. There are too many file formats out there that aren't compatible with each other, so a narrow definition of a solution is the key.

Related Topic