How to remove high-ASCII characters from string like ®, ©, ™ in Java

后端 未结 4 1853
庸人自扰
庸人自扰 2020-12-06 05:52

I want to detect and remove high-ASCII characters like ®, ©, ™ from a String in Java. Is there any open-source library that can do this?

4条回答
  •  生来不讨喜
    2020-12-06 06:55

    If you need to remove all non-US-ASCII (i.e. outside 0x0-0x7F) characters, you can do something like this:

    s = s.replaceAll("[^\\x00-\\x7f]", "");
    

    If you need to filter many strings, it would be better to use a precompiled pattern:

    private static final Pattern nonASCII = Pattern.compile("[^\\x00-\\x7f]");
    ...
    s = nonASCII.matcher(s).replaceAll();
    

    And if it's really performance-critical, perhaps Alex Nikolaenkov's suggestion would be better.

提交回复
热议问题