Remove ✅,

前端 未结 7 1796
离开以前
离开以前 2020-11-28 20:03

I have some strings with all kinds of different emojis/images/signs in them.

Not all the strings are in English -- some of them are in other non-Latin languages, for

7条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-28 20:43

    Instead of blacklisting some elements, how about creating a whitelist of the characters you do wish to keep? This way you don't need to worry about every new emoji being added.

    String characterFilter = "[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]";
    String emotionless = aString.replaceAll(characterFilter,"");
    

    So:

    • [\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s] is a range representing all numeric (\\p{N}), letter (\\p{L}), mark (\\p{M}), punctuation (\\p{P}), whitespace/separator (\\p{Z}), other formatting (\\p{Cf}) and other characters above U+FFFF in Unicode (\\p{Cs}), and newline (\\s) characters. \\p{L} specifically includes the characters from other alphabets such as Cyrillic, Latin, Kanji, etc.
    • The ^ in the regex character set negates the match.

    Example:

    String str = "hello world _# 皆さん、こんにちは! 私はジョンと申します。

提交回复
热议问题