[SalesForce] Regular Expression to find Chinese characters

If a string contains atleast one chinese character then i have return a boolean. I am looking for exact regular expression to use in pattern.matches method.

Any help much appreciated.

Best Answer

Java/Apex has a script keyword for Chinese characters that you could use:

public static Boolean containsChineseCharacters(String InputString){
    Pattern p = Pattern.compile('\\p{IsHan}');
    Matcher m = p.matcher( InputString );
    return m.find();
}

If you want your regular expression to find characters that are Chinese Hanzi and not also Japanese Kanji, there isn't an easy way to do it.

The most commonly used CJKV ideographs are found in the Unicode CJK Unified Ideographs Block*. Many of these characters are used by multiple languages, and what will make your regex difficult is that the characters aren't separated by language. There is no sub-block for "Just Chinese" or "Chinese and Japanese". Unicode has the characters ordered by radical and stroke number, which means that they are all interspersed. Your regular expression would have to look for a very large number of small ranges and individual code points.

*There are other locations that contain CJKV ideographs, but this highlights the difficulty of the task.

Related Topic