[SalesForce] How to validate UTF-8 in regex

I have basic validation rules setup for name fields:

NOT(REGEX(FirstName, "^[A-Za-z\\. '-]+$"))

The goal is to only allow letters, periods, spaces, hyphens and apostrophes in the name field. The problem with this is that it does not allow accented characters (graphemes). I've tried some simplified ideas based on a regex tutorial and the Java Docs Salesforce links to, but they do not work:

NOT( REGEX( FirstName , "\\P{M}\\p{M}") )
NOT( REGEX( FirstName , "\\p{Alpha}") )
NOT( REGEX( FirstName , "\\X") )

Has anybody else run into this problem? How do you validate names with accent marks?

Update: After further testing I'm making some progress:
The validation rule REGEX(LastName, "(?>\\P{M}\\p{M}*)") successfully flags "é" as a match. Unfortunately that means pretty much any character is a match and I want to exclude numerals and most punctuation.

Best Answer

This might need some refinement, but my understanding is \p{L} will match "a single code point in the category 'letter'".

I tested the following as Anonymous Apex and got the Matches debug message.

String FirstName = 'Fredé';

Pattern regexPattern = Pattern.compile('^[\\p{L}\\. \'-]+$');
Matcher regexMatcher = regexPattern.matcher(FirstName);

if (!regexMatcher.matches()) {
    System.debug(LoggingLevel.Warn, 'No Matches');
} else {
    System.debug(LoggingLevel.Debug, 'Matches');
}

According to the Regex Tutorial: Unicode Character Properties you will probably need to add \p{M}* to optionally match any diacritics:

To match a letter including any diacritics, use \p{L}\p{M}*. This last regex will always match à, regardless of how it is encoded.

Related Solutions

[SalesForce] Regex Validation rule for telephone number

I would recommend a two step approach. First keep your length restrictions on hold and go and find a well tested working solution for phone numbers. You don't have to reinvent the wheel:

try as formula then something like:

NOT(  
  REGEX( SamplePhone__c, "^\\+([0-9 ]+)$" ) 
  &&  LEN(  SUBSTITUTE(SamplePhone__c , " ", "")   ) < 26  
  &&  LEN(  SUBSTITUTE(SamplePhone__c , " ", "")   ) > 6 
)

Replace the regex to the one of your choice and use nested SUBSTITUE(...) to remove other character you don't want to count against your Min and Max.

The idea is to decouple the regex and the length checks and combine them logically.

[SalesForce] Regex to validate Credit Card numbers

This is really more of a regex question than a SFDC question, so it truly belongs elsewhere in StackExchange.

However, if you know that your patterns work independently, you could change the definition of the VR to use an OR() statement instead of putting the conditionals within the regex pattern itself. This also greatly simplifies troubleshooting.

NOT( 
    OR(
        REGEX( Customfield__c , "Visa Pattern"),
        REGEX( Customfield__c , "Mastercard Pattern"),
        REGEX( Customfield__c , "Discover Pattern"),
        REGEX( Customfield__c , "AmEx Pattern")
    )
)

Best Answer

Related Solutions

[SalesForce] Regex Validation rule for telephone number

[SalesForce] Regex to validate Credit Card numbers

Related Topic