[SalesForce] Bulkified generation of unique 6-character alphanumeric IDs, with strict validation criteria

I have a requirement to generate a unique 6-character alphanumeric ID for records on a bulkified before-insert trigger, on an object already containing about 5 million records and expected to grow to upwards of 40 million over the next couple years. The ID must conform to the following rules:

  1. Must contain at least one number and one letter
  2. Every character in the ID must be unique
  3. Cannot contain the numbers 0 or 1, or the letters O or I

My current approach is essentially to create an auto-incrementing base-32 field which skips the values that don’t conform to rules 1 and 2, and uses W, X, Y, and Z instead of the characters mentioned in rule 3.

I have this “working” already, but the method I’m using to skip the invalid values is clearly too inefficient for a before-insert trigger. Brute force sequential validation is not viable because sometimes there are long gaps between valid numbers. For instance, there are over a million consecutive invalid values between 32VUTS and 34WX25 (103807932 and 104858693 in base-10). My current solution spans this gap with only about 33k values checked, but it still consistently uses up 7-8 seconds of the 10-second CPU time governor limit.

Current implementation:

Integer lastFound = 103807931; // base-10 value of last ID used, retrieved from custom setting

Integer recordCount = 200; // # of records in Trigger.new;

List<String> valid = new List<String>();
while(valid.size() < recordCount){
    valid.add(doIncrement(lastFound));
}

System.debug(Limits.getCpuTime());

public String doIncrement(Integer startAfter){
    Integer current = startAfter + 1;
    String current32 = decToBase32_Alt(current);    

    while(matchFormat(current32) == false){   

        List<Integer> chars = current32.getChars();        
        Integer position = -1;       

        for(integer i = 0; i < chars.size() - 1 && position == -1; i++){

            for(Integer j = i + 1; j < chars.size(); j++){
                if(chars[i] == chars[j]){
                    position = chars.size() - j - 2;
                    break;
                }
            }        
        }
        if(position > 0){
            current = current + 32^position;
        }
        else{
            current++;
        }

        current32 = decToBase32_Alt(current);
    }

    lastFound = current;
    return current32;
}    

// validation: all unique characters, at least one number and one letter
public Boolean matchFormat(String input){    
    return new Set<Integer>(input.getChars()).size() == input.length() && input.isAlpha() == false && input.isNumeric() == false;    
}

// base-10 to base-32, alternate character set
public String decToBase32_Alt(Integer val){

    List<String> charSet = new List<String>{'W','X','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','Y','J','K','L','M','N','Z','P','Q','R','S','T','U','V'};

        String result = ''; 

    while(val != 0){
        Integer remainder = Math.mod(val,32);
        val = Math.floor(val/32).intValue();
        result = charSet[remainder] + result;
    }        
    return result;    
}

I am not a mathematician, but intuitively it seems very likely that there is some way to do this much more efficiently. Any ideas?

I am also open to suggestions for other ways of achieving this requirement. I have considered random generation coupled with collision retries, as suggested by this answer, but I’m concerned that there would be far too many collisions given the number of valid IDs (about 126 million, I believe) and tens of millions of records.

Credit to this answer for implementation of the base-32 conversion.

Best Answer

Instead of thinking about the system as a single number, consider the system as a set of 6 smaller numbers. This lets you get rid of most of the naivety of a brute force algorithm.

Here's the version I came up with:

public class q173224 {
    public static Integer nextId(Integer value) {
        value++;
        Integer a = (value >> 25) & 31, b = (value >> 20) & 31, c = (value >> 15) & 31,
            d = (value >> 10) & 31, e = (value >> 5) & 31, f = value & 31;
        for(; a < 8; (b = 8) == 8 && (a = a + 1) >= 0) {
            for(; b < 32; (c = 0) == 0 && (b = b + 1) >= 0) {
                if(a != b) {
                    for(; c < 32; (d = 0) == 0 && (c = c + 1) >= 0) {
                        if(a != c && b != c) {
                            for(; d < 32; (e = 0) == 0 && (d = d + 1) >= 0) {
                                if(a != d && b != d && c != d) {
                                    for(; e < 32; (f = 0) == 0 && (e = e + 1) >= 0) {
                                        if(a != e && b != e && c != e && d != e) {
                                            for(; f < 32; f++) {
                                                if(a != f && b != f && c != f && d != f && e != f) {
                                                    return (a << 25) + (b << 20) + (c << 15) + (d << 10) + (e << 5) + f;
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
        return null;
    }
    Static String[] keys = '23456789ABCDEFGHJKLMNPQRSTUVWXYZ'.split('');
    public static String base32encode(Integer value) {
        return keys[(value >> 25) & 31] + keys[(value >> 20) & 31] + keys[(value >> 15) & 31] + 
            keys[(value >> 10) & 31]  +  keys[(value >> 5) & 31]  +  keys[value & 31];
    }
}

P.S. I apologize for the ugly hacks, but this code is optimized for performance, not legibility.

A few explanations:

(X = Y) == Z

This is a trick to simulate Java's "," operator, which would have greatly simplified the code. In Java, it would have been written as:

X = Y, W++;

Of course, << and >> are left-shift bits, and right-shift bits; it's a lot easier than having to write things like a / 33554432 and so on. Likewise, we're using & 31 to get Math.mod(value, 32). This entire code is optimized to run like greased lightning and for code coverage purposes (just pass in 0 and 1<<31 to get 100% coverage of nextId).

Finally, the first parameter, the initialization of the loop, isn't done in the traditional way, because we need to start the loop at a predetermined value instead of starting at 0 (or some specific value).

Converting this to Java, I found that there are actually a total of 126,282,240 unique ID values that fit your criteria, allowing 8 possible values in the first position, 24 possible values in the second position, and 32 possible values for all other positions, with no duplicate characters in any position. The code never has to go more than 10 false loops to find a unique value, instead of needing to skip hundreds of thousands of values one at a time; it can generate an ID in less than 2ms per ID.

This requires no random guesses, and just a single solitary database query to determine where to start. Keep in mind that you must use a query to get a row lock on your custom setting (FOR UPDATE) to make sure that you don't generate the same ID twice for transactions that are in-flight simultaneously.