[SalesForce] Single query for all records without duplicate in field

I am doubting this is possible in SOQL given the requirements and limitations of the language, but wanted to sanity check to make sure I am not overlooking any possible solutions before I throw in the towel.

We are currently building an azure application for which we need to migrate a large subset of our Salesforce users into. I have been tasked with building a query that will return all of the user records that have a certain field filled in, but more importantly need to eliminate all of the users who share an email with any other user in the SF org. I cannot use apex to go through the user records and eliminate the duplicate values.

In SQL, I believe this could be accomplished with using a where clause like this-

WHERE NOT EXISTS (SELECT 1 FROM User U2 WHERE U1.Email = U2.Email AND U1.Id <> U2.ID)

But so far, the closest I have been able to come to accomplishing this in SOQL is the following query-

SELECT Email, FirstName, LastName, Id, IsActive, Username FROM User WHERE Custom_Field_A__c != null AND isActive = true GROUP BY FirstName, LastName, Username, Id, IsActive, Email HAVING COUNT(Id) = 1 Limit 200

However, there are several problems with this query as is. The first is the only way it works is to limit to 200 records, when the number of users that needs to be returned is a much larger number, and there does not seem to be a way to run multiple batches of the query when aggregating fields. It also needs to return the created and modified dates, which seem to not be able to be included in a query that is aggregating fields. Lastly, the way this is group makes it so that all of the values collectively have to be not shared by any other records, but I really just want to filter out users with a duplicate email, not really caring if they have a duplicate first or last name.

Would it be possible to reform this query in some fashion that it will return all of the users in which the Custom_Field_A__c is not null, and the email is not shared between any other users, and all of the fields mentioned can still be returned, without using apex to assist in some fashion?

Best Answer

You can do it with just SOQL, though you may still need to add further filtering to reduce record count, since it will use aggregate queries, which do not support queryMore. But the HAVING clause may give you exactly what you need. The basic idea would be:

SELECT Email
FROM User
WHERE My_Field__c != null
GROUP BY Email
HAVING count(Id) = 1

If you group by more fields, the above will only deduplicate for the comprehensive set of filters. So if you wanted to deduplicate for only Email, then you might need two separate queries.