[SalesForce] Query to see all Files related to specific object (Events)

The following post talks about running a query to return Files related to a specific record:

How to query all files attached to the Opportunity

What I'm trying to do is return ALL files related to an object (Events in this case). I tried the following query, but it returned an error that indicates that I need to include all of the record id's that I want it to look through (which could be millions). I just want the query to return the results based on the associated object, without having to put in individual record id's

SELECT ContentDocumentId,Id,LinkedEntityId FROM ContentDocumentLink WHERE LinkedEntityId LIKE '00U%'

Here's the error I received:

MALFORMED_QUERY: Implementation restriction: ContentDocumentLink requires a filter by a single Id on ContentDocumentId or LinkedEntityId using the equals operator or multiple Id's using the IN operator.

Best Answer

ContentDocumentLink, as you can see from that error message, has very strict requirements on how you can query it. You must supply a literal Id value or a set of Id values.

This is a major limitation. The only way I have found around it for very broad-based or global queries like the one you're interested in is a two-stage process off-platform with some Python glue to construct ContentDocumentLink queries. (If there is a better solution, I would love to hear about it!)

You may be able to do something similar in an Apex batch job depending on what you're hoping to do with these files/ContentDocumentLinks.

First, you'd use a report, or Workbench Bulk API job, to pull the full list of Ids of Events in your organization. Then you take that huge list of Ids and paste it into a Python script, as below. (Note that this requires the superb simple_salesforce module, which I use for this kind of glue-API-scripting all the time).

What this script does is iterate over the arbitrarily-large set of Ids you provide it and construct ContentDocumentLink queries by batching those Ids into groups as large as allowed by the platform. It saves the ContentDocumentLink data in CSV format on standard output.

#!/usr/bin/env python

from simple_salesforce import Salesforce 
import csv
import sys

ids = [x for x in """
PASTE_YOUR_ID_LIST_HERE
""".split('\n') if len(x) > 0]

query_start = """
SELECT Id, LinkedEntityId, ContentDocumentId, Visibility, ShareType
FROM ContentDocumentLink
WHERE ContentDocumentId IN (
"""

query_end = """
)
ORDER BY ContentDocumentId
"""

query = ''
sf = Salesforce(username='YOUR_USER_NAME', password='YOUR_PASSWORD', security_token = 'YOUR_SECURITY_TOKEN')
writer = csv.DictWriter(f = sys.stdout, fieldnames=['Id', 'LinkedEntityId', 'ContentDocumentId', 'Visibility', 'ShareType'])
writer.writeheader()

while len(ids) > 0:
    query = '\'' + ids.pop() + '\''

    # The maximum length of the WHERE clause is 4,000 characters
    while len(query) < 4000 and len(ids) > 0:
        query += ', \'' + ids.pop() + '\''

    cdls = sf.query_all(query_start + query + query_end)

    for cdl in cdls.get('records'):
        writer.writerow({key: cdl[key] for key in writer.fieldnames})