Android - How to filter emoji (emoticons) from a string?

前端 未结 4 1278
天命终不由人
天命终不由人 2020-12-09 00:22

I\'m working on an Android app, and I do not want people to use emoji in the input.

How can I remove emoji characters from a string?

相关标签:
4条回答
  • 2020-12-09 00:39

    Emojis can be found in the following ranges (source) :

    • U+2190 to U+21FF
    • U+2600 to U+26FF
    • U+2700 to U+27BF
    • U+3000 to U+303F
    • U+1F300 to U+1F64F
    • U+1F680 to U+1F6FF

    You can use this line in your script to filter them all at once:

    text.replace("/[\u2190-\u21FF]|[\u2600-\u26FF]|[\u2700-\u27BF]|[\u3000-\u303F]|[\u1F300-\u1F64F]|[\u1F680-\u1F6FF]/g", "");

    0 讨论(0)
  • 2020-12-09 00:48

    For those using Kotlin, Char.isSurrogate can help as well. Find and remove the indexes that are true from that.

    0 讨论(0)
  • 2020-12-09 00:56

    Here is what I use to remove emojis. Note: This only works on API 24 and forwards

    public  String remove_Emojis_For_Devices_API_24_Onwards(String name)
       {
        // we will store all the non emoji characters in this array list
         ArrayList<Character> nonEmoji = new ArrayList<>();
    
        // this is where we will store the reasembled name
        String newName = "";
    
        //Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
        if (Build.VERSION.SDK_INT > 23) {
            /* we are going to cycle through the word checking each character
             to find its unicode script to compare it against known alphabets*/
            for (int i = 0; i < name.length(); i++) {
                // currently emojis don't have a devoted unicode script so they return UNKNOWN
                if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN")) {
                    nonEmoji.add(name.charAt(i));//its not an emoji so we add it
                }
            }
            // we then cycle through rebuilding the string
            for (int i = 0; i < nonEmoji.size(); i++) {
                newName += nonEmoji.get(i);
            }
        }
        return newName;
    }
    

    so if we pass in a string:

    remove_Emojis_For_Devices_API_24_Onwards("

    0 讨论(0)
  • 2020-12-09 00:58

    Latest emoji data can be found here:

    http://unicode.org/Public/emoji/

    There is a folder named with emoji version. As app developers a good idea is to use latest version available.

    When You look inside a folder, You'll see text files in it. You should check emoji-data.txt. It contains all standard emoji codes.

    There are a lot of small symbol code ranges for emoji. Best support will be to check all these in Your app.

    Some people ask why there are 5 digit codes when we can only specify 4 after \u. Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.

    For example, we have a string.

    String s = ...;
    

    UTF-16 representation

    byte[] utf16 = s.getBytes("UTF-16BE");
    

    Iterate over UTF-16

    for(int i = 0; i < utf16.length; i += 2) {
    

    Get one char

    char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));
    

    Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.

    if(c >= 0xd800 && c <= 0xd83f) {
        high = c;
        continue;
    }
    

    For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.

    else if(c >= 0xdc00 && c <= 0xdfff) {
        low = c;
        long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;
    }
    

    All other symbols are not pairs so process them as is.

    else {
        long unicode = c;
    }
    

    Now use data from emoji-data.txt to check if it's emoji. If it is, then skip it. If not then copy bytes to output byte array.

    Finally byte array is converted to String by

    String out = new String(outarray, Charset.forName("UTF-16BE"));
    
    0 讨论(0)
提交回复
热议问题