Apex Regex – Escaping Characters and Matcher Method Help

I have a string:

allPunctuation = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';

This string contains APEX's way of escaping characters so I can include the ' and the \ characters.

When I pass this string into the replaceAll() method I get back an error because the escape characters are not carried over and the regex needs the escape characters. Is there a way around this?

ie: when I system.debug allPunctuation I get back: ~!@#$%^*()_+|}{":?><`=;/.,][-'\ This clearly doesn't have the escape characters anymore.

My second thought was to utilize the matcher method. Documentation found here: https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_classes_pattern_and_matcher_using.htm

I am not sure why, but it is returning back a false. To me it seems like it shouldn't be affected by the escape quotes. And legalName has all sorts of punctuation in it.

Pattern NonAlphanumeric = Pattern.compile('[^a-zA-Z0-9 ]');
Matcher matcher = NonAlphanumeric.matcher(legalName);
//system.debug(matcher) ---> false

Could someone point me in the right direction?

Best Answer

You can match all punctuation using \\p{Punct}, as mentioned in the Pattern class, which matches:

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

For example, the following code results in an empty String:

String s = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';
System.debug(s.replaceAll('\\p{Punct}',''));

Note that the "escapes" are not disappearing, they're being compiled. If you want a literal backslash escape, you have to escape it twice:

String s = '~!@#$%^*()_+|}{":?><`=;/.,][-\\\'\\\\';

Where \\\' results in the pattern/matcher/regexp engine seeing \', and \\\\ results in the engine seeing \\.

Adrian's solution also works, but I think that \p{Punct} is a bit more explicit with declaring the intent of your code (to match any punctuation).

Related Topic