Remove all non-“word characters” from a String in Java, leaving accented characters?

后端 未结 5 1246
被撕碎了的回忆
被撕碎了的回忆 2020-11-28 02:18

Apparently Java\'s Regex flavor counts Umlauts and other special characters as non-\"word characters\" when I use Regex.

        \"TESTÜTEST\".replaceAll( \"         


        
5条回答
  •  春和景丽
    2020-11-28 02:55

    You might want to remove the accents and diacritic signs first, then on each character position check if the "simplified" string is an ascii letter - if it is, the original position shall contain word characters, if not, it can be removed.

提交回复
热议问题