[SalesForce] Fuzzy Logic / matching on Company Name (leads)

I need some help

I have looked at conventional methods of using De-Duping tools however because i work for an organisation which is multi alliance the de-duping tool looks at the entire database, when i actually need to look at segments of the database

SO i have decided to try and create my own de-duping tool.

So far i have created the following apex .
The apex currently looks at the company name on the lead and if there is an exact match with another company name in the database it provide the user the error message ‘another new lead has the same company name”

This is great if the company name is exact, however i need it to be more flexible .

For example if “Burger King Limited” was a lead created in 2012, and a seller has decided to create a lead called “Burger King LTD” in 2013 which is the same company as the lead which was created in 2012. I want to build a fuzzy logic, that looks at the new lead and if there is Slight resemblance then disregard the new lead

Trigger DuplicateLeadPreventer on Lead
                               (before insert, before update) {

//Get map of record types we care about from Custom Setting
 Map<String, Manage_Lead_Dupes_C__c> leadrtmap = Manage_Lead_Dupes_C__c.getAll();




 //Since only certain leads will match, put them in a separate list
 List<Lead> LeadstoProcess = new List<Lead> ();

 //Company to Lead Map
 Map<String, Lead> leadMap = new Map<String, Lead>();

    for (Lead lead : Trigger.new) {

     //Only process for Leads in our RecordTypeMap
         if (leadrtmap.keyset().contains(lead.RecordTypeId) ) {

        // Make sure we don't treat an Company name that 
       // isn't changing during an update as a duplicate. 

              if (
                 (lead.company != null) &&
                 (Trigger.isInsert ||
                 (lead.company != Trigger.oldMap.get(lead.Id).company))
                 ) 
                 {

                    // Make sure another new lead isn't also a duplicate 

                        if (leadMap.containsKey(lead.company)) {
                            lead.company.addError('Another new lead has the '
                                            + 'same company name.');
                        } else {
                            leadMap.put(lead.company , lead);
                            LeadstoProcess.add(lead);
                        }
                }
    } //end RT If Check
    } //End Loop

    /*
     Using a single database query, find all the leads in 
     the database that have the same company address as any 
     of the leads being inserted or updated. 

   */

    Set<String> ExistingCompanies = new Set<String> ();

            for (Lead l: [Select Id, Company from Lead WHERE Company IN :leadMap.keyset()
                             AND RecordTypeId IN :leadrtmap.keyset()]) {
                          ExistingCompanies.add(l.Company);
                }

    //Now loop through leads to process, since we should only loop if matches
    for (Lead l : LeadstoProcess) {
        if (ExistingCompanies.contains(l.company) ) {
             l.company.addError('A lead with this company '
                               + 'name already exists.');
        }
    }
}

Best Answer

Fuzzy matching on data is an entire industry, and due to its programmatic complexity is not well-suited - in it's most advanced forms - to the Force.com platform. Most companies that offer comprehensive data de-duplication and other master data management services have their core engine outside of Salesforce.

What you're trying to build is admirable and definitely very fun, but in practical terms not possible unless you create very hard limits on the scope of what you can dedupe i.e. exact, or near-exact matches. I do know a particular company that has built fuzzy logic on the platform for this purpose but it was very difficult and quite limited, and it was their entire company's purpose!

If you do want to try out some of the concepts and algorithms that are being used elsewhere have a look at the background and academics behind record linkage.

Related Topic