Salesforce Storage Issue – Hitting Data Storage Limits with Email Messages

data-storageemailmessage

We continue to have our data cap hit for Salesforce data. Prior to spending more on storage, I want to ensure I am doing my due diligence.

We just purchased 3 more units of storage last month that should have taken us to the end of the contract, however now we are at our cap again.

We checked the storage usage and Email Messages are using up 82% of our storage. We have 325,429 records and 10.8 GB used.

From what I can see we have 3 storage types:

Data Storage- 13.5 GB Limit – 13.1 GB Used – 98% Used

File Storage – 224 GB Limit – 58.3 GB Used – 25% Used

Big Object Storage – 1,000,000 Limit – 0 GB Used – 0% Used

Given that the trail of email communications is critical to our support teams conversations (and, to a lesser extent, sales conversations) – we would like to ideally retain full visibility on communications.

How can we reduce the storage burden of existing Email Messages? Can we duce the size of Email Messages that are associated with Cases moving forward?

Some initial quick thoughts:

  • Purge Email Messages related to closed Cases flagged as spam

  • Remove headers / convert rich text to basic text

  • Export emails associated with Cases older than X years, delete them as "Email Messages", and re-attach them as attachments to the Case (since this counts towards a different data storage limit)

Best Answer

My org has run into similar issues with the amount of data storage consumed by EmailMessage. Unfortunately, EmailMessage is a difficult object to work with.

From my own testing:

  • There are two fields on EmailMessage that are treated differently by Salesforce when it comes to storage. TextBody and HtmlBody. These fields consume storage space for every byte stored (i.e. count your characters, and that's how many bytes are consumed, multi-byte characters push that up even higher).
  • Salesforce also treats the HtmlBody field in a special way that allows the html to be rendered (which I was not able to reproduce with another field).
  • If you get an email with only an HtmlBody, Salesforce will use that to additionally populate the TextBody field.
  • The TextBody and HtmlBody fields cannot be updated unless the Status is "Draft" (i.e. you're composing the email from the Salesforce UI)
  • We can clone EmailMessage records (which allows us to alter the two 'body' fields), and there is a system-level permission (set audit fields) that can be granted to maintain system fields like CreatedBy, CreatedDate, LastModifiedBy, LastModifiedDate, etc...
  • If the CreatedDate is carried over, it does appear that the relative record order is preserved
  • If you transfer the content of TextBody and HtmlBody to custom fields (and null out TextBody and HtmlBody), EmailMessage records only take the standard 2kB that pretty much every other record does
  • The easiest way for us to "clone" EmailMessage records is to use something like Dataloader to extract the records, then re-insert them (i.e. don't map the Id field). If you're transferring TextBody and HtmlBody to custom fields, you can just do that with some simple re-mapping in Dataloader
  • Attachments on emails are another matter (and I don't recall if I worked out how to handle that)

So we can reduce the storage consumed by EmailMessage records, but there can be some serious drawbacks to consider (html portion of emails not being rendered in Salesforce if you store it in a custom field instead of HtmlBody, perhaps the standard "send an email" button would cease to work for such an EmailMessage).

In my org (and in others too, most likely), one of the big offenders is that Salesforce is storing the entire email thread every time a customer replies. So if your email conversation goes through 3 round trips (you -> customer -> you -> customer -> you -> customer), you're effectively storing 21 emails when the final email comes into Salesforce.

Repeated bits like email signatures and the to/from/cc/date information don't help matters either.

In terms of possible solutions I was looking at, here are the ones I came up with in order of storage space saved (most savings -> least savings)

  1. Use a trigger, scheduled/batch apex, or a middleware tool to capture EmailMessages as they arrive, and send them to an external platform (I'm currently investigating Amazon's "Snowflake" for this). Develop a Lightning component to make callouts and retrieve the email data on demand. Once stored off-platform, delete the EmailMessage from Salesforce. (Fairly customization-heavy, disrupts standard functionality)
  2. Delete previous emails under a given parent on receipt of a new one (chances are pretty good that you'll still have the entire thread)
  3. Archive/Export emails older than a certain age
  4. Use Dataloader (or a similar tool) to effectively clone records, and map TextBody and HtmlBody to custom fields (likely disrupts standard functionality, but saves a bunch of space)
  5. Use Dataloader (or a similar tool) to clone records and simply leave the HtmlBody field unmapped (I think I found that Html takes something like 3x the characters as plain text on average)
  6. Use Dataloader (or a similar tool) to clone records, and remove rendundant information (signatures, the entire thread history. Being able to process the data to detect/remove redundant information is another matter)
Related Topic