[SalesForce] Best way to spot non-GSM characters in a string

There are other posts here re: Apex matcher functions and replacing characters in a string. I simply need to know if a given string contains any non-gsm chars. GSM refers to a standardized set of characters used for mobile texting. I don't need to know which char or index it is.

I found a regex for gsm that seems solid:

GSM_CHARACTERS_REGEX = "^[A-Za-z0-9 \\r\\n@£$¥èéùìòÇØøÅå\u0394_\u03A6\u0393\u039B\u03A9\u03A0\u03A8\u03A3\u0398\u039EÆæßÉ!\"#$%&'()*+,\\-./:;<=>?¡ÄÖÑܧ¿äöñüà^{}\\\\\\[~\\]|\u20AC]*$"

I'm thinking there must be something quicker than checking every char in my string — something like the some property from Javascript?

Any ideas?

I would use a keyup event to test the latest char entered, but it's possible that users will paste in a string, or use a template that inputs the full string all at once.

I can handle this check either in Javascript on the client, or in Apex. Currently thinking that I'll create a const array of all the gsm chars, then immediately before sending to Apex, I'll use javascript string.split("") to create an array of all the letters, then create a set from that array, then loop through the set, checking to see if const gsm includes the char.

Best Answer

I ended up doing it in the Javascript on the onchange for the text entry. It splits the string into a set to minimize the comparison, then back to an array where it filters that array against a const array of all gsm chars.

The gsmChars in my list are documented on a number of sites. The challenge for me was coming up with the fastest way to check each char in a string against that list so it could happen with every new character entered. The bottom part of my handler is just the functional part of what I needed to do -- if there's a non-GSM character, indicated that the message will be broken into segments of up to 67 chars, else segments of 153 chars.

const gsmChars = [  'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
                'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
                'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
                'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
                '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
                '!', '#', ' ', '"', '%', '&', '\'', '(', ')', '*', ',', '.', '?',
                '+', '-', '/', ';', ':', '<', '=', '>', '¡', '¿', '_', '@',
                '$', '£', '¥', '¤',
                'è', 'é', 'ù', 'ì', 'ò', 'Ç', 'Ø', 'ø', 'Æ', 'æ', 'ß', 'É', 'Å',
                'å', 'Ä', 'Ö', 'Ñ', 'Ü', '§', 'ä', 'ö', 'ñ', 'ü', 'à',
                'Δ','Φ','Ξ','Γ','Ω','Π','Ψ','Σ','Θ','Λ'];

checkBody(event){
    let body = event.detail.value;
    this.charCount = body.length;
    let bodyCharSet = new Set(body.split(""));
    let bodyCharArray = Array.from(bodyCharSet);
    this.notGsm = ((bodyCharArray.filter(e => !gsmChars.includes(e)).length > 0));
    if(this.notGsm){
        this.segmentCount = Math.ceil(body.length / 67);
    }else {
        this.segmentCount = Math.ceil(body.length / 153);
    }
}