[SalesForce] Regex to evaluate string with specific format in Process Builder

A team I work with is sending me leads with a single tracking code that represents several different facts about the leads, separated by ">" symbols. I've got a Process that is successfully splitting up the value and saving it to the appropriate fields, so when everything is formatted correctly, it works just fine.

The problem is that the systems/people who are sending me these values don't always get it right, and I can't guarantee the formatting (specifically, the inclusion of four ">" symbols) will always be correct. I'm trying to add an earlier step in the process to catch errors and save them to a different field to allow for manual correction/follow-up with the team that's made the error, and I think a REGEX expression is the right way to go, but it's not working as expected.

Here's the formula I'm using in my first decision point in the process; the business intent is to only trigger when the value has changed AND there is a value in the field AND where that value doesn't adhere to the proper format (XXX>XXX>XXX>XXX>XXX, where X could represent any number of characters that are a-z, A-Z, 0-9, &,-,., or /, but could be omitted):

AND(
  ISCHANGED([Lead].MarketingCode__c),
  NOT(ISBLANK([Lead].MarketingCode__c)),
  NOT(REGEX([Lead].MarketingCode__c, 
              "[\\w&-.\\/]*>[\\w&-.\\/]*>[\\w&-.\\/]*>

               [\\w&-.\\/]*>[\\w&-.\\/]"))
)

Unfortunately, this step is now always being hit, even when the format of MarketingCode__c is correct. I've done very little work with REGEX, so my first suspicion is that I've entered the expression incorrectly. Can anyone provide feedback on where I may have gone wrong? Thanks!

Best Answer

First, &-. matches all characters between & and ., including *, (, ), ', etc. Move the dash to either the beginning, the end, or escape it.

This behavior happens any time you have a character set where the pattern .-. (where . means any character) occurs, because that forms a character range (e.g. a-z is a usual case, but it can actually be any two characters). This may cause some false positives/negatives.

Second, you can use the {} operator to limit the number of appearances of stuff, and () groups to perform repeats to shorten your code. Your long form simply makes it harder to read.

Third, instead of using *, I'd recommend using {,3} if you want to limit the code to three characters per set.

Fourth, be aware that \w also includes _, so you might want to use the longer form of \w minus the underscore.

This leads me to a final pattern of:

"([a-zA-Z0-9\\d&.\\/-]{,3}>){4}[a-zA-Z0-9&.\\/-]{,3}"

The rest of your formula looks okay, but you'll want to test this out to make sure. I wrote this as a validation rule in an object in my dev org, because I hate building processes just to test something like this, but it should work, at least in theory.

Related Topic