Replace Unicode Control Characters

后端 未结 1 1907
时光取名叫无心
时光取名叫无心 2020-12-09 06:50

I need to replace all special control character in a string in Java.

I want to ask the Google maps API v3, and Google doesn\'t seems to like these characters.

<
相关标签:
1条回答
  • 2020-12-09 07:29

    If you want to delete all characters in Other/Control Unicode category, you can do something like this:

        System.out.println(
            "a\u0000b\u0007c\u008fd".replaceAll("\\p{Cc}", "")
        ); // abcd
    

    Note that this actually removes (among others) '\u008f' Unicode character from the string, not the escaped form "%8F" string.

    If the blacklist is not nicely captured by one Unicode block/category, Java does have a powerful character class arithmetics featuring intersection, subtraction, etc that you can use. Alternatively you can also use a negated whitelist approach, i.e. instead of explicitly specifying what characters are illegal, you specify what are legal, and everything else then becomes illegal.

    API links

    • java.util.regex.Pattern
    • regular-expressions.info/Character Class

    Examples

    Here's a subtraction example:

        System.out.println(
            "regular expressions: now you have two problems!!"
                .replaceAll("[a-z&&[^aeiou]]", "_")
        );
        //   _e_u_a_ e___e__io__: _o_ _ou _a_e __o __o__e__!!
    

    The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.

    [a-z&&[^aeiou]] matches [a-z] subtracted by [aeiou], i.e. all lowercase consonants.

    The next example shows the negated whitelist approach:

        System.out.println(
            "regular expressions: now you have two problems!!"
                .replaceAll("[^a-z]", "_")
        );
        //   regular_expressions__now_you_have_two_problems__
    

    Only lowercase letters a-z are legal; everything else is illegal.

    0 讨论(0)
提交回复
热议问题