Remove all non-“word characters” from a String in Java, leaving accented characters?

后端 未结 5 1248
被撕碎了的回忆
被撕碎了的回忆 2020-11-28 02:18

Apparently Java\'s Regex flavor counts Umlauts and other special characters as non-\"word characters\" when I use Regex.

        \"TESTÜTEST\".replaceAll( \"         


        
5条回答
  •  感情败类
    2020-11-28 02:48

    I was trying to achieve the exact opposite when I bumped on this thread. I know it's quite old, but here's my solution nonetheless. You can use blocks, see here. In this case, compile the following code (with the right imports):

    > String s = "äêìóblah"; 
    > Pattern p = Pattern.compile("[\\p{InLatin-1Supplement}]+"); // this regex uses a block
    > Matcher m = p.matcher(s);
    > System.out.println(m.find());
    > System.out.println(s.replaceAll(p.pattern(), "#"));
    

    You should see the following output:

    true

    #blah

    Best,

提交回复
热议问题