[SalesForce] Unicode non-breaking space is not considered white space

Can anyone confirm that Unicode \u00A0 non-breaking space is not considered "whitespace" by Apex and is not detected by trim(), deleteWhitespace, or regex? I'm surprised by regex since I though \s was supposed to include non-breaking spaces.

Of the methods below, only replaceAll with the character code works.

String x = '\u00A0' + 'Test';
String y = x.unescapeUnicode();
system.debug('### y trim length: ' + y.trim().length());
system.debug('### y deleteWhitespace length: ' + y.deleteWhitespace().length());
system.debug('### y replaceall regex length: ' + y.replaceAll('\\s', '').length());
system.debug('### y replaceall unicode length: ' + y.replaceAll('\\u00A0', '').length());

Best Answer

The non-breaking space is not whitespace, according to Java. Apex Code uses the same rules as the Java Pattern class, which specifies \s as follows:

\s  A whitespace character: [ \t\n\x0B\f\r]

Where " " is 0x20, \t is 0x09, \n is 0x0A, \x0B is 0x0B, \f is 0x0C, and \r is 0x0D. No other characters are defined as whitespace, despite Unicode having a number of them.

Related Topic