[SalesForce] Issues with Regex matcher on APEX

I'm looking to create a regex that will grab all of the lines of text which comply with the following structure:

  • Group1: 2-3 chars of upper case letters
  • Group2: A datetime of day month and year with flexible separators
  • Group3: Any block of text up to an end a line

Any blank spaces between the groups would be ignored. Currently the regex I have built is (?m)([A-Z]{2,3})[\s]+([0-9]+[-\/\.][0-9][0-9][-\/\.][0-9]+)[\s:,-]+([^\n]+)$ and it works on a few regex testing sites that I have gone to with this sample text:

#Rubbish
AA 22/05/2017: First block of text. \n
BB 15/05/2017: Second block of text. \n
AA 01/05/2017: Third block  of text \n\n

Rubbish block

To be precise I've tried on https://regex101.com/ and there if I enable the global flag, all of the rows get detected, without it only the first, but I at least get a match. But when I take it into Apex I end up with this code

string message = '#Rubbish \n' + 
'AA 22/05/2017: First block of text. \n' +
'BB 15/05/2017: Second block of text. \n' +
'AA 01/05/2017: Third block  of text \n\n' +
'Rubbish block';

System.debug(message);

// Preparing regex
Pattern regex = Pattern.compile('(?m)([A-Z]{2,3})[\s]+([0-9]+[-\/\.][0-9][0-9][-\/\.][0-9]+)[\s:,-]+([^\n]+)$');
Matcher regexMatcher = regex.matcher(message);

if(regexMatcher.matches() == true) {
    System.debug(regexMatcher);
}
else {
    System.debug('no');
}

And initially I get compilation errors. I play a bit with the regex string, escaping the \ by adding an additional one, but I'm still unable to get any actual matches even when I don't get compilation errors anymore.

Could anybody have a look and tell me what is wrong? I'm convinced it's a dumb oversight, but I'm still not managing to see the issue.

Best Answer

There's two issues here: one is escaping the regex correctly, and one is the semantics of checking for a match. The following code works:

Pattern regex = Pattern.compile('(?m)([A-Z]{2,3})[\\s]+([0-9]+[\\-/.][0-9][0-9][\\-/.][0-9]+)[\\s:,-]+([^\\n]+)$');
Matcher regexMatcher = regex.matcher(message);

while (regexMatcher.find()) {
    System.debug(regexMatcher.group());
}

09:43:06:004 USER_DEBUG [14]|DEBUG|AA 22/05/2017: First block of text.

09:43:06:004 USER_DEBUG [14]|DEBUG|BB 15/05/2017: Second block of text.

09:43:06:004 USER_DEBUG [14]|DEBUG|AA 01/05/2017: Third block of text

Note that Matcher.matches() returns true for a whole-region (whole-string, in this case) match, which we don't have. find() returns true when we're able to match, which we are here.

Additionally, you must escape all backslashes in an Apex string, and there's no need to escape a forward slash or period in a regex character class.

Related Topic