[SalesForce] How to read text content of MS Word files in Salesforce

I know we can search for within a Word content file in Salesforce using SOSL. But reading the VersionData field of ContentVersion returns a Blob which is not UTF-8 format content, so we could not convert it into a text string within APEX.

I'm looking for any way to achieve the objective of reading a Content file (MS Word) document into a test stream.

Is this is not possible natively within platform, does anyone knows of any REST API services that offer this function?

Best Answer

In Apex, it's not a trivial thing to read .doc files. I use javascript libraries to do it. Here is one of the tools that I use:

https://docxtemplater.readthedocs.io/en/latest/installation.html#browser

It does an ok job.

Related Topic