Replacing Emoji Unicode Range from Arabic Tweets using Java

前端 未结 2 910
萌比男神i
萌比男神i 2021-01-01 05:10

I am trying to replace emoji from Arabic tweets using java.

I used this code:

String line = \"اييه تقولي اجل الارسنال تعادل امس بعد ما كان فايز          


        
2条回答
  •  萌比男神i
    2021-01-01 05:40

    From the Javadoc for the Pattern class

    A Unicode character can also be represented in a regular-expression by using its Hex notation(hexadecimal code point value) directly as described in construct \x{...}, for example a supplementary character U+2011F can be specified as \x{2011F}, instead of two consecutive Unicode escape sequences of the surrogate pair \uD840\uDD1F.

    This means that the regular expression that you're looking for is ([\x{1F601}-\x{1F64F}]). Of course, when you write this as a Java String literal, you must escape the backslashes.

    Pattern unicodeOutliers = Pattern.compile("([\\x{1F601}-\\x{1F64F}])");
    

    Note that the construct \x{...} is only available from Java 7.

提交回复
热议问题