Remove all non-“word characters” from a String in Java, leaving accented characters?

后端 未结 5 1260
被撕碎了的回忆
被撕碎了的回忆 2020-11-28 02:18

Apparently Java\'s Regex flavor counts Umlauts and other special characters as non-\"word characters\" when I use Regex.

        \"TESTÜTEST\".replaceAll( \"         


        
5条回答
  •  广开言路
    2020-11-28 02:56

    Use [^\p{L}\p{Nd}]+ - this matches all (Unicode) characters that are neither letters nor (decimal) digits.

    In Java:

    String resultString = subjectString.replaceAll("[^\\p{L}\\p{Nd}]+", "");
    

    Edit:

    I changed \p{N} to \p{Nd} because the former also matches some number symbols like ¼; the latter doesn't. See it on regex101.com.

提交回复
热议问题