What is the regex to extract all the emojis from a string?

情到浓时终转凉″ 提交于 2019-11-26 03:28:11

问题


I have a String encoded in UTF-8. For example:

Thats a nice joke 😆😆😆 😛

I have to extract all the emojis present in the sentence. And the emoji could be any

When this sentence is viewed in terminal using command less text.txt it is viewed as:

Thats a nice joke <U+1F606><U+1F606><U+1F606> <U+1F61B>

This is the corresponding UTF code for the emoji. All the codes for emojis can be found at emojitracker.

For the purpose of finding all the occurances, I used a regular expression pattern (<U\\+\\w+?>) but it didnt work for the UTF-8 encoded string.

Following is my code:

    String s=\"Thats a nice joke 😆😆😆 😛\";
    Pattern pattern = Pattern.compile(\"(<U\\\\+\\\\w+?>)\");
    Matcher matcher = pattern.matcher(s);
    List<String> matchList = new ArrayList<String>();

    while (matcher.find()) {
        matchList.add(matcher.group());
    }

    for(int i=0;i<matchList.size();i++){
        System.out.println(matchList.get(i));

    }

This pdf says Range: 1F300–1F5FF for Miscellaneous Symbols and Pictographs. So I want to capture any character lying within this range.


回答1:


the pdf that you just mentioned says Range: 1F300–1F5FF for Miscellaneous Symbols and Pictographs. So lets say I want to capture any character lying within this range. Now what to do?

Okay, but I will just note that the emoji in your question are outside that range! :-)

The fact that these are above 0xFFFF complicates things, because Java strings store UTF-16. So we can't just use one simple character class for it. We're going to have surrogate pairs. (More: http://www.unicode.org/faq/utf_bom.html)

U+1F300 in UTF-16 ends up being the pair \uD83C\uDF00; U+1F5FF ends up being \uD83D\uDDFF. Note that the first character went up, we cross at least one boundary. So we have to know what ranges of surrogate pairs we're looking for.

Not being steeped in knowledge about the inner workings of UTF-16, I wrote a program to find out (source at the end — I'd double-check it if I were you, rather than trusting me). It tells me we're looking for \uD83C followed by anything in the range \uDF00-\uDFFF (inclusive), or \uD83D followed by anything in the range \uDC00-\uDDFF (inclusive).

So armed with that knowledge, in theory we could now write a pattern:

// This is wrong, keep reading
Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");

That's an alternation of two non-capturing groups, the first group for the pairs starting with \uD83C, and the second group for the pairs starting with \uD83D.

But that fails (doesn't find anything). I'm fairly sure it's because we're trying to specify half of a surrogate pair in various places:

Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");
// Half of a pair --------------^------^------^-----------^------^------^

We can't just split up surrogate pairs like that, they're called surrogate pairs for a reason. :-)

Consequently, I don't think we can use regular expressions (or indeed, any string-based approach) for this at all. I think we have to search through char arrays.

char arrays hold UTF-16 values, so we can find those half-pairs in the data if we look for it the hard way:

String s = new StringBuilder()
                .append("Thats a nice joke ")
                .appendCodePoint(0x1F606)
                .appendCodePoint(0x1F606)
                .appendCodePoint(0x1F606)
                .append(" ")
                .appendCodePoint(0x1F61B)
                .toString();
char[] chars = s.toCharArray();
int index;
char ch1;
char ch2;

index = 0;
while (index < chars.length - 1) { // -1 because we're looking for two-char-long things
    ch1 = chars[index];
    if ((int)ch1 == 0xD83C) {
        ch2 = chars[index+1];
        if ((int)ch2 >= 0xDF00 && (int)ch2 <= 0xDFFF) {
            System.out.println("Found emoji at index " + index);
            index += 2;
            continue;
        }
    }
    else if ((int)ch1 == 0xD83D) {
        ch2 = chars[index+1];
        if ((int)ch2 >= 0xDC00 && (int)ch2 <= 0xDDFF) {
            System.out.println("Found emoji at index " + index);
            index += 2;
            continue;
        }
    }
    ++index;
}

Obviously that's just debug-level code, but it does the job. (In your given string, with its emoji, of course it won't find anything as they're outside the range. But if you change the upper bound on the second pair to 0xDEFF instead of 0xDDFF, it will. No idea if that would also include non-emojis, though.)


Source of my program to find out what the surrogate ranges were:

public class FindRanges {

    public static void main(String[] args) {
        char last0 = '\0';
        char last1 = '\0';
        for (int x = 0x1F300; x <= 0x1F5FF; ++x) {
            char[] chars = new StringBuilder().appendCodePoint(x).toString().toCharArray();
            if (chars[0] != last0) {
                if (last0 != '\0') {
                    System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
                }
                System.out.print("\\u" + Integer.toHexString((int)chars[0]).toUpperCase() + " \\u" + Integer.toHexString((int)chars[1]).toUpperCase());
                last0 = chars[0];
            }
            last1 = chars[1];
        }
        if (last0 != '\0') {
            System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
        }
    }
}

Output:

\uD83C \uDF00-\uDFFF
\uD83D \uDC00-\uDDFF



回答2:


Using emoji-java i've wrote a simple method that removes all emojis including fitzpatrick modifiers. Requires an external library but easier to maintain than those monster regexes.

Use:

String input = "A string 😄with a \uD83D\uDC66\uD83C\uDFFFfew 😉emojis!";
String result = EmojiParser.removeAllEmojis(input);

emoji-java maven installation:

<dependency>
  <groupId>com.vdurmont</groupId>
  <artifactId>emoji-java</artifactId>
  <version>3.1.3</version>
</dependency>

gradle:

compile 'com.vdurmont:emoji-java:3.1.3'

EDIT: previously submitted answer was pulled into emoji-java source code.




回答3:


Had a similar problem. The following served me well and matches surrogate pairs

public class SplitByUnicode {
    public static void main(String[] argv) throws Exception {
        String string = "Thats a nice joke 😆😆😆 😛";
        System.out.println("Original String:"+string);
        String regexPattern = "[\uD83C-\uDBFF\uDC00-\uDFFF]+";
        byte[] utf8 = string.getBytes("UTF-8");

        String string1 = new String(utf8, "UTF-8");

        Pattern pattern = Pattern.compile(regexPattern);
        Matcher matcher = pattern.matcher(string1);
        List<String> matchList = new ArrayList<String>();

        while (matcher.find()) {
            matchList.add(matcher.group());
        }

        for(int i=0;i<matchList.size();i++){
            System.out.println(i+":"+matchList.get(i));

        }
    }
}

Output is:


Original String:Thats a nice joke 😆😆😆 😛
0:😆😆😆
1:😛

Found the regex from https://stackoverflow.com/a/24071599/915972




回答4:


This worked for me in java 8:

public static String mysqlSafe(String input) {
  if (input == null) return null;
    StringBuilder sb = new StringBuilder();

    for (int i = 0; i < input.length(); i++) {
      if (i < (input.length() - 1)) { // Emojis are two characters long in java, e.g. a rocket emoji is "\uD83D\uDE80";
        if (Character.isSurrogatePair(input.charAt(i), input.charAt(i + 1))) {
          i += 1; //also skip the second character of the emoji
          continue;
        }
      }
      sb.append(input.charAt(i));
    }

  return sb.toString();
}



回答5:


you can do it like this

    String s="Thats a nice joke 😆😆😆 😛";
    Pattern pattern = Pattern.compile("[\ud83c\udc00-\ud83c\udfff]|[\ud83d\udc00-\ud83d\udfff]|[\u2600-\u27ff]",
                                      Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(s);
    List<String> matchList = new ArrayList<String>();

    while (matcher.find()) {
        matchList.add(matcher.group());
    }

    for(int i=0;i<matchList.size();i++){
        System.out.println(matchList.get(i));
    }



回答6:


The best regex for extracting ALL emoji is this:

(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])

It identifies many single-char emoji that the other answers do not account for. For more information about how this regex works, take a look at this post. https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb#.enomgcu63




回答7:


Assuming that you are asking for standard Unicode emoji ranges (there are different blocks by vendor) you may consider these three ranges:

  • 0x20a0 - 0x32ff
  • 0x1f000 - 0x1ffff
  • 0xfe4e5 - 0xfe4ee

Besides all the thoughtful explanation that T.J.Crowder has shared with us, needs to be said that beginning with Java 7 is possible to match UTF-16 encoded surrogate pairs with ease.

Take a look at the docs:

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

A Unicode character can also be represented in a regular-expression by using its Hex notation(hexadecimal code point value) directly as described in construct \x{...}, for example a supplementary character U+2011F can be specified as \x{2011F}, instead of two consecutive Unicode escape sequences of the surrogate pair \uD840\uDD1F.

Nevertheless, if you cannot switch to Java 7, you can extend the valuable UnicodeEscaper provided by Guava.

Here an implementation for the sake of example:

public class SimpleEscaper extends UnicodeEscaper
{
    @Override
    protected char[] escape(int codePoint)
    {
        if (0x1f000 >= codePoint && codePoint <= 0x1ffff)
        {
            return Integer.toHexString(codePoint).toCharArray();
        }

        return Character.toChars(codePoint);
    }
}



回答8:


Emoji regex

public static final String sEmojiRegex = "(?:[\\u2700-\\u27bf]|" +

        "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
        "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?" +

        "(?:\\u200d(?:[^\\ud800-\\udfff]|" +

        "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
        "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?)*|" +

        "[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|[\\ud83c\\udd70-\\ud83c\\udd71]|[\\ud83c\\udd7e-\\ud83c\\udd7f]|\\ud83c\\udd8e|[\\ud83c\\udd91-\\ud83c\\udd9a]|[\\ud83c\\udde6-\\ud83c\\uddff]|[\\ud83c\\ude01-\\ud83c\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ud83c\\ude3a]|[\\ud83c\\ude50-\\ud83c\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]";

some emojis (1627)

// count = 1627
public static final String sEmojiTest = "😀😃😄😁😆😅😂🤣☺️😊😇🙂🙃😉😌😍😘😗😙😚😋😜😝😛🤑🤗🤓😎🤡🤠😏😒😞😔😟😕🙁☹️😣😖😫😩😤😠😡😶😐😑😯😦😧😮😲😵😳😱😨😰😢😥🤤😭😓😪😴🙄🤔🤥😬🤐🤢🤧😷🤒🤕😈👿👹👺💩👻💀☠️👽👾🤖🎃😺😸😹😻😼😽🙀😿😾👐🙌👏🙏🤝👍👎👊✊🤛🤜🤞✌️🤘👌👈👉👆👇☝️✋🤚🖐🖖👋🤙💪🖕✍️🤳💅💍💄💋👄👅👂👃👣👁👀🗣👤👥👶👦👧👨👩👱‍♀👱👴👵👲👳‍♀👳👮‍♀👮👷‍♀👷💂‍♀💂🕵️‍♀️🕵👩‍⚕👨‍⚕👩‍🌾👨‍🌾👩‍🍳👨‍🍳👩‍🎓👨‍🎓👩‍🎤👨‍🎤👩‍🏫👨‍🏫👩‍🏭👨‍🏭👩‍💻👨‍💻👩‍💼👨‍💼👩‍🔧👨‍🔧👩‍🔬👨‍🔬👩‍🎨👨‍🎨👩‍🚒👨‍🚒👩‍✈👨‍✈👩‍🚀👨‍🚀👩‍⚖👨‍⚖🤶🎅👸🤴👰🤵👼🤰🙇‍♀🙇💁💁‍♂🙅🙅‍♂🙆🙆‍♂🙋🙋‍♂🤦‍♀🤦‍♂🤷‍♀🤷‍♂🙎🙎‍♂🙍🙍‍♂💇💇‍♂💆💆‍♂🕴💃🕺👯👯‍♂🚶‍♀🚶🏃‍♀🏃👫👭👬💑👩‍❤️‍👩👨‍❤️‍👨💏👩‍❤️‍💋‍👩👨‍❤️‍💋‍👨👪👨‍👩‍👧👨‍👩‍👧‍👦👨‍👩‍👦‍👦👨‍👩‍👧‍👧👩‍👩‍👦👩‍👩‍👧👩‍👩‍👧‍👦👩‍👩‍👦‍👦👩‍👩‍👧‍👧👨‍👨‍👦👨‍👨‍👧👨‍👨‍👧‍👦👨‍👨‍👦‍👦👨‍👨‍👧‍👧👩‍👦👩‍👧👩‍👧‍👦👩‍👦‍👦👩‍👧‍👧👨‍👦👨‍👧👨‍👧‍👦👨‍👦‍👦👨‍👧‍👧👚👕👖👔👗👙👘👠👡👢👞👟👒🎩🎓👑⛑🎒👝👛👜💼👓🕶🌂☂️🐶🐱🐭🐹🐰🦊🐻🐼🐨🐯🦁🐮🐷🐽🐸🐵🙈🙉🙊🐒🐔🐧🐦🐤🐣🐥🦆🦅🦉🦇🐺🐗🐴🦄🐝🐛🦋🐌🐚🐞🐜🕷🕸🐢🐍🦎🦂🦀🦑🐙🦐🐠🐟🐡🐬🦈🐳🐋🐊🐆🐅🐃🐂🐄🦌🐪🐫🐘🦏🦍🐎🐖🐐🐏🐑🐕🐩🐈🐓🦃🕊🐇🐁🐀🐿🐾🐉🐲🌵🎄🌲🌳🌴🌱🌿☘️🍀🎍🎋🍃🍂🍁🍄🌾💐🌷🌹🥀🌻🌼🌸🌺🌎🌍🌏🌕🌖🌗🌘🌑🌒🌓🌔🌚🌝🌞🌛🌜🌙💫⭐️🌟✨⚡️🔥💥☄☀️🌤⛅️🌥🌦🌈☁️🌧⛈🌩🌨☃️⛄️❄️🌬💨🌪🌫🌊💧💦☔️🍏🍎🍐🍊🍋🍌🍉🍇🍓🍈🍒🍑🍍🥝🥑🍅🍆🥒🥕🌽🌶🥔🍠🌰🥜🍯🥐🍞🥖🧀🥚🍳🥓🥞🍤🍗🍖🍕🌭🍔🍟🥙🌮🌯🥗🥘🍝🍜🍲🍥🍣🍱🍛🍚🍙🍘🍢🍡🍧🍨🍦🍰🎂🍮🍭🍬🍫🍿🍩🍪🥛🍼☕️🍵🍶🍺🍻🥂🍷🥃🍸🍹🍾🥄🍴🍽⚽️🏀🏈⚾️🎾🏐🏉🎱🏓🏸🥅🏒🏑🏏⛳️🏹🎣🥊🥋⛸🎿⛷🏂🏋️‍♀️🏋🤺🤼‍♀🤼‍♂🤸‍♀🤸‍♂⛹️‍♀️⛹🤾‍♀🤾‍♂🏌️‍♀️🏌🏄‍♀🏄🏊‍♀🏊🤽‍♀🤽‍♂🚣‍♀🚣🏇🚴‍♀🚴🚵‍♀🚵🎽🏅🎖🥇🥈🥉🏆🏵🎗🎫🎟🎪🤹‍♀🤹‍♂🎭🎨🎬🎤🎧🎼🎹🥁🎷🎺🎸🎻🎲🎯🎳🎮🎰🚗🚕🚙🚌🚎🏎🚓🚑🚒🚐🚚🚛🚜🛴🚲🛵🏍🚨🚔🚍🚘🚖🚡🚠🚟🚃🚋🚞🚝🚄🚅🚈🚂🚆🚇🚊🚉🚁🛩✈️🛫🛬🚀🛰💺🛶⛵️🛥🚤🛳⛴🚢⚓️🚧⛽️🚏🚦🚥🗺🗿🗽⛲️🗼🏰🏯🏟🎡🎢🎠⛱🏖🏝⛰🏔🗻🌋🏜🏕⛺️🛤🛣🏗🏭🏠🏡🏘🏚🏢🏬🏣🏤🏥🏦🏨🏪🏫🏩💒🏛⛪️🕌🕍🕋⛩🗾🎑🏞🌅🌄🌠🎇🎆🌇🌆🏙🌃🌌🌉🌁⌚️📱📲💻⌨️🖥🖨🖱🖲🕹🗜💽💾💿📀📼📷📸📹🎥📽🎞📞☎️📟📠📺📻🎙🎚🎛⏱⏲⏰🕰⌛️⏳📡🔋🔌💡🔦🕯🗑🛢💸💵💴💶💷💰💳💎⚖️🔧🔨⚒🛠⛏🔩⚙️⛓🔫💣🔪🗡⚔️🛡🚬⚰️⚱️🏺🔮📿💈⚗️🔭🔬🕳💊💉🌡🚽🚰🚿🛁🛀🛎🔑🗝🚪🛋🛏🛌🖼🛍🛒🎁🎈🎏🎀🎊🎉🎎🏮🎐✉️📩📨📧💌📥📤📦🏷📪📫📬📭📮📯📜📃📄📑📊📈📉🗒🗓📆📅📇🗃🗳🗄📋📁📂🗂🗞📰📓📔📒📕📗📘📙📚📖🔖🔗📎🖇📐📏📌📍✂️🖊🖋✒️🖌🖍📝✏️🔍🔎🔏🔐🔒🔓❤️💛💚💙💜🖤💔❣️💕💞💓💗💖💘💝💟☮️✝️☪️🕉☸️✡️🔯🕎☯️☦️🛐⛎♈️♉️♊️♋️♌️♍️♎️♏️♐️♑️♒️♓️🆔⚛️🉑☢️☣️📴📳🈶🈚️🈸🈺🈷️✴️🆚💮🉐㊙️㊗️🈴🈵🈹🈲🅰️🅱️🆎🆑🅾️🆘❌⭕️🛑⛔️📛🚫💯💢♨️🚷🚯🚳🚱🔞📵🚭❗️❕❓❔‼️⁉️🔅🔆〽️⚠️🚸🔱⚜️🔰♻️✅🈯️💹❇️✳️❎🌐💠Ⓜ️🌀💤🏧🚾♿️🅿️🈳🈂️🛂🛃🛄🛅🚹🚺🚼🚻🚮🎦📶🈁🔣ℹ️🔤🔡🔠🆖🆗🆙🆒🆕🆓0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣🔟🔢#️⃣*️⃣▶️⏸⏯⏹⏺⏭⏮⏩⏪⏫⏬◀️🔼🔽➡️⬅️⬆️⬇️↗️↘️↙️↖️↕️↔️↪️↩️⤴️⤵️🔀🔁🔂🔄🔃🎵🎶➕➖➗✖️💲💱™️©️®️〰️➰➿🔚🔙🔛🔝🔜✔️☑️🔘⚪️⚫️🔴🔵🔺🔻🔸🔹🔶🔷🔳🔲▪️▫️◾️◽️◼️◻️⬛️⬜️🔈🔇🔉🔊🔔🔕📣📢👁‍🗨💬💭🗯♠️♣️♥️♦️🃏🎴🀄️🕐🕑🕒🕓🕔🕕🕖🕗🕘🕙🕚🕛🕜🕝🕞🕟🕠🕡🕢🕣🕤🕥🕦🕧🏳️🏴🏁🚩🏳️‍🌈🇦🇫🇦🇽🇦🇱🇩🇿🇦🇸🇦🇩🇦🇴🇦🇮🇦🇶🇦🇬🇦🇷🇦🇲🇦🇼🇦🇺🇦🇹🇦🇿🇧🇸🇧🇭🇧🇩🇧🇧🇧🇾🇧🇪🇧🇿🇧🇯🇧🇲🇧🇹🇧🇴🇧🇶🇧🇦🇧🇼🇧🇷🇮🇴🇻🇬🇧🇳🇧🇬🇧🇫🇧🇮🇨🇻🇰🇭🇨🇲🇨🇦🇮🇨🇰🇾🇨🇫🇹🇩🇨🇱🇨🇳🇨🇽🇨🇨🇨🇴🇰🇲🇨🇬🇨🇩🇨🇰🇨🇷🇨🇮🇭🇷🇨🇺🇨🇼🇨🇾🇨🇿🇩🇰🇩🇯🇩🇲🇩🇴🇪🇨🇪🇬🇸🇻🇬🇶🇪🇷🇪🇪🇪🇹🇪🇺🇫🇰🇫🇴🇫🇯🇫🇮🇫🇷🇬🇫🇵🇫🇹🇫🇬🇦🇬🇲🇬🇪🇩🇪🇬🇭🇬🇮🇬🇷🇬🇱🇬🇩🇬🇵🇬🇺🇬🇹🇬🇬🇬🇳🇬🇼🇬🇾🇭🇹🇭🇳🇭🇰🇭🇺🇮🇸🇮🇳🇮🇩🇮🇷🇮🇶🇮🇪🇮🇲🇮🇱🇮🇹🇯🇲🇯🇵🎌🇯🇪🇯🇴🇰🇿🇰🇪🇰🇮🇽🇰🇰🇼🇰🇬🇱🇦🇱🇻🇱🇧🇱🇸🇱🇷🇱🇾🇱🇮🇱🇹🇱🇺🇲🇴🇲🇰🇲🇬🇲🇼🇲🇾🇲🇻🇲🇱🇲🇹🇲🇭🇲🇶🇲🇷🇲🇺🇾🇹🇲🇽🇫🇲🇲🇩🇲🇨🇲🇳🇲🇪🇲🇸🇲🇦🇲🇿🇲🇲🇳🇦🇳🇷🇳🇵🇳🇱🇳🇨🇳🇿🇳🇮🇳🇪🇳🇬🇳🇺🇳🇫🇲🇵🇰🇵🇳🇴🇴🇲🇵🇰🇵🇼🇵🇸🇵🇦🇵🇬🇵🇾🇵🇪🇵🇭🇵🇳🇵🇱🇵🇹🇵🇷🇶🇦🇷🇪🇷🇴🇷🇺🇷🇼🇧🇱🇸🇭🇰🇳🇱🇨🇵🇲🇻🇨🇼🇸🇸🇲🇸🇹🇸🇦🇸🇳🇷🇸🇸🇨🇸🇱🇸🇬🇸🇽🇸🇰🇸🇮🇸🇧🇸🇴🇿🇦🇬🇸🇰🇷🇸🇸🇪🇸🇱🇰🇸🇩🇸🇷🇸🇿🇸🇪🇨🇭🇸🇾🇹🇼🇹🇯🇹🇿🇹🇭🇹🇱🇹🇬🇹🇰🇹🇴🇹🇹🇹🇳🇹🇷🇹🇲🇹🇨🇹🇻🇺🇬🇺🇦🇦🇪🇬🇧🇺🇸🇻🇮🇺🇾🇺🇿🇻🇺🇻🇦🇻🇪🇻🇳🇼🇫🇪🇭🇾🇪🇿🇲🇿🇼⚽️🏀🏈⚾️🎾🏐🏉🎱🏓🏸🥅🏒🏑🏏⛳️🏹🎣🥊🥋⛸🎿⛷🏂🏋️‍♀️🏋🏻‍♀️🏋🏼‍♀️🏋🏽‍♀️🏋🏾‍♀️🏋🏿‍♀️🏋️🏋🏻🏋🏼🏋🏽🏋🏾🏋🏿🤺🤼‍♀️🤼‍♂️🤸‍♀️🤸🏻‍♀️🤸🏼‍♀️🤸🏽‍♀️🤸🏾‍♀️🤸🏿‍♀️🤸‍♂️🤸🏻‍♂️🤸🏼‍♂️🤸🏽‍♂️🤸🏾‍♂️🤸🏿‍♂️⛹️‍♀️⛹🏻‍♀️⛹🏼‍♀️⛹🏽‍♀️⛹🏾‍♀️⛹🏿‍♀️⛹️⛹🏻⛹🏼⛹🏽⛹🏾⛹🏿🤾‍♀️🤾🏻‍♀️🤾🏼‍♀️🤾🏽‍♀️🤾🏾‍♀️🤾🏿‍♀️🤾‍♂️🤾🏻‍♂️🤾🏼‍♂️🤾🏽‍♂️🤾🏾‍♂️🤾🏿‍♂️🏌️‍♀️🏌🏻‍♀️🏌🏼‍♀️🏌🏽‍♀️🏌🏾‍♀️🏌🏿‍♀️🏌️🏌🏻🏌🏼🏌🏽🏌🏾🏌🏿🏄‍♀️🏄🏻‍♀️🏄🏼‍♀️🏄🏽‍♀️🏄🏾‍♀️🏄🏿‍♀️🏄🏄🏻🏄🏼🏄🏽🏄🏾🏄🏿🏊‍♀️🏊🏻‍♀️🏊🏼‍♀️🏊🏽‍♀️🏊🏾‍♀️🏊🏿‍♀️🏊🏊🏻🏊🏼🏊🏽🏊🏾🏊🏿🤽‍♀️🤽🏻‍♀️🤽🏼‍♀️🤽🏽‍♀️🤽🏾‍♀️🤽🏿‍♀️🤽‍♂️🤽🏻‍♂️🤽🏼‍♂️🤽🏽‍♂️🤽🏾‍♂️🤽🏿‍♂️🚣‍♀️🚣🏻‍♀️🚣🏼‍♀️🚣🏽‍♀️🚣🏾‍♀️🚣🏿‍♀️🚣🚣🏻🚣🏼🚣🏽🚣🏾🚣🏿🏇🏇🏻🏇🏼🏇🏽🏇🏾🏇🏿🚴‍♀️🚴🏻‍♀️🚴🏼‍♀️🚴🏽‍♀️🚴🏾‍♀️🚴🏿‍♀️🚴🚴🏻🚴🏼🚴🏽🚴🏾🚴🏿🚵‍♀️🚵🏻‍♀️🚵🏼‍♀️🚵🏽‍♀️🚵🏾‍♀️🚵🏿‍♀️🚵🚵🏻🚵🏼🚵🏽🚵🏾🚵🏿🎽🏅🎖🥇🥈🥉🏆🏵🎗🎫🎟🎪🤹‍♀️🤹‍♂️🎭🎨🎬🎤🎧🎼🎹🥁🎷🎺🎸🎻🎲🎯🎳🎮🎰";

function to test emojis

public void checkMatchingEmojis() {

    final Pattern pattern = Pattern.compile(sEmojiRegex);
    final Matcher matcher = pattern.matcher(sEmojiTest);
    int foundEmojiCount = 0;
    while (matcher.find()) {
        System.out.println("Full match: " + matcher.group(0));
        foundEmojiCount++;
    }
    System.out.println("*******************************************");
    System.out.println("Input Emoji count = 1627");
    System.out.println("Captured Emoji count = " + foundEmojiCount);
    System.out.println("*******************************************");

}

Here is the gist, tested on all unicode 10 emojis

Thanks to Kevin Scott for writting greate example




回答9:


You may also use emoji4j library.

String emojiText = "A 🐱, 🐱 and a 🐭 became friends. For 🐶's birthday party, they all had 🍔s, 🍟s, 🍪s and 🍰.";

EmojiUtils.removeAllEmojis(emojiText);//returns "A ,  and a  became friends. For 's birthday party, they all had s, s, s and .



回答10:


There are two ways to solve this sticky problem.

The first one is Using third-party libs like emoji-java and emoji4j. These are mentioned above. You can easily use the method containsEmoji or removesEmoji, etc. And in your own Apps, you need to keep update with these libs.

As for me, I want to find a simple solution to solve this problem.

After a whole day of searching, I've found a magic regex:

"(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)"

which I have tested OK in Java. It perfectly solved my problem.

You can view this on the Github page:

https://github.com/zly394/EmojiRegex

Notes:

The answer which provided by @Eric Nakagawa contains some errors, which cannot be operated properly.




回答11:


Just to use regex to solve it:

s = s.replaceAll("\\p{So}+", "");

You can find it in

http://www.regular-expressions.info/unicode.html

https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#OTHER_SYMBOL





回答12:


This is what I use to remove emojis and so far it has shown to allow all other alphabets.

private static String remove_Emojis(String name)
{  

    //we will store all the letters in this array
    ArrayList<Character> nonEmoji = new ArrayList<>();

     // and when we rebuild the name we will put it in here
    String newName = "";


    // we are going to loop through checking each character to see if its an emoji or not
    for (int i = 0; i < name.length(); i++) 
     {

        if (Character.isLetterOrDigit(name.charAt(i)))
        {
            nonEmoji.add(name.charAt(i));
        } 

         else 
          {
             // this is just a 2nd check in case the other method didn't allow some letter
            if (Build.VERSION.SDK_INT > 18)
            {
                if (Character.isAlphabetic(name.charAt(i))) 
                {
                    nonEmoji.add(name.charAt(i));
                }
            }
        }


        if (name.charAt(i) == ' ')// may want to consider adding or '-' or '\''
        {
            nonEmoji.add(i);// just add it
        }

        if (name.charAt(i) == '@' && !name.contains(" "))// I put this in for email addresses
        {
            nonEmoji.add('@');
        }
    }

    // finally just loop through building it back out
    for (int i = 0; i < nonEmoji.size(); i++) {

        newName += nonEmoji.get(i);
    }

    return newName;
}



回答13:


You can generate your own regex whenever the spec changes.
This tool (screenshot here).

For utf-8/32 mode (stringed), expanded mode :

"     # Use the 'Mega-Conversion' tool to change into other syntaxes"
"     # -------------------------------------------------------------"
"     "
"     [#*0-9] \\x{FE0F} \\x{20E3}"
"  |  [\\x{A9}\\x{AE}\\x{203C}\\x{2049}\\x{2122}\\x{2139}\\x{2194}-\\x{2199}\\x{21A9}\\x{21AA}\\x{231A}\\x{231B}\\x{2328}\\x{23CF}\\x{23E9}-\\x{23F3}\\x{23F8}-\\x{23FA}\\x{24C2}\\x{25AA}\\x{25AB}\\x{25B6}\\x{25C0}\\x{25FB}-\\x{25FE}\\x{2600}-\\x{2604}\\x{260E}\\x{2611}\\x{2614}\\x{2615}\\x{2618}]"
"  |  \\x{261D} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{2620}\\x{2622}\\x{2623}\\x{2626}\\x{262A}\\x{262E}\\x{262F}\\x{2638}-\\x{263A}\\x{2640}\\x{2642}\\x{2648}-\\x{2653}\\x{265F}\\x{2660}\\x{2663}\\x{2665}\\x{2666}\\x{2668}\\x{267B}\\x{267E}\\x{267F}\\x{2692}-\\x{2697}\\x{2699}\\x{269B}\\x{269C}\\x{26A0}\\x{26A1}\\x{26AA}\\x{26AB}\\x{26B0}\\x{26B1}\\x{26BD}\\x{26BE}\\x{26C4}\\x{26C5}\\x{26C8}\\x{26CE}\\x{26CF}\\x{26D1}\\x{26D3}\\x{26D4}\\x{26E9}\\x{26EA}\\x{26F0}-\\x{26F5}\\x{26F7}\\x{26F8}]"
"  |  \\x{26F9}"
"     (?:"
"          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{26FA}\\x{26FD}\\x{2702}\\x{2705}\\x{2708}\\x{2709}]"
"  |  [\\x{270A}-\\x{270D}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{270F}\\x{2712}\\x{2714}\\x{2716}\\x{271D}\\x{2721}\\x{2728}\\x{2733}\\x{2734}\\x{2744}\\x{2747}\\x{274C}\\x{274E}\\x{2753}-\\x{2755}\\x{2757}\\x{2763}\\x{2764}\\x{2795}-\\x{2797}\\x{27A1}\\x{27B0}\\x{27BF}\\x{2934}\\x{2935}\\x{2B05}-\\x{2B07}\\x{2B1B}\\x{2B1C}\\x{2B50}\\x{2B55}\\x{3030}\\x{303D}\\x{3297}\\x{3299}\\x{1F004}\\x{1F0CF}\\x{1F170}\\x{1F171}\\x{1F17E}\\x{1F17F}\\x{1F18E}\\x{1F191}-\\x{1F19A}]"
"  |  \\x{1F1E6} [\\x{1F1E8}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F2}\\x{1F1F4}\\x{1F1F6}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FD}\\x{1F1FF}]"
"  |  \\x{1F1E7} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EF}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
"  |  \\x{1F1E8} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1EE}\\x{1F1F0}-\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}-\\x{1F1FF}]"
"  |  \\x{1F1E9} [\\x{1F1EA}\\x{1F1EC}\\x{1F1EF}\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1FF}]"
"  |  \\x{1F1EA} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1ED}\\x{1F1F7}-\\x{1F1FA}]"
"  |  \\x{1F1EB} [\\x{1F1EE}-\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1F7}]"
"  |  \\x{1F1EC} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EE}\\x{1F1F1}-\\x{1F1F3}\\x{1F1F5}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FE}]"
"  |  \\x{1F1ED} [\\x{1F1F0}\\x{1F1F2}\\x{1F1F3}\\x{1F1F7}\\x{1F1F9}\\x{1F1FA}]"
"  |  \\x{1F1EE} [\\x{1F1E8}-\\x{1F1EA}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}]"
"  |  \\x{1F1EF} [\\x{1F1EA}\\x{1F1F2}\\x{1F1F4}\\x{1F1F5}]"
"  |  \\x{1F1F0} [\\x{1F1EA}\\x{1F1EC}-\\x{1F1EE}\\x{1F1F2}\\x{1F1F3}\\x{1F1F5}\\x{1F1F7}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
"  |  \\x{1F1F1} [\\x{1F1E6}-\\x{1F1E8}\\x{1F1EE}\\x{1F1F0}\\x{1F1F7}-\\x{1F1FB}\\x{1F1FE}]"
"  |  \\x{1F1F2} [\\x{1F1E6}\\x{1F1E8}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1FF}]"
"  |  \\x{1F1F3} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F4}\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}\\x{1F1FF}]"
"  |  \\x{1F1F4} \\x{1F1F2}"
"  |  \\x{1F1F5} [\\x{1F1E6}\\x{1F1EA}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1F3}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FC}\\x{1F1FE}]"
"  |  \\x{1F1F6} \\x{1F1E6}"
"  |  \\x{1F1F7} [\\x{1F1EA}\\x{1F1F4}\\x{1F1F8}\\x{1F1FA}\\x{1F1FC}]"
"  |  \\x{1F1F8} [\\x{1F1E6}-\\x{1F1EA}\\x{1F1EC}-\\x{1F1F4}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FD}-\\x{1F1FF}]"
"  |  \\x{1F1F9} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1ED}\\x{1F1EF}-\\x{1F1F4}\\x{1F1F7}\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FF}]"
"  |  \\x{1F1FA} [\\x{1F1E6}\\x{1F1EC}\\x{1F1F2}\\x{1F1F3}\\x{1F1F8}\\x{1F1FE}\\x{1F1FF}]"
"  |  \\x{1F1FB} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1EE}\\x{1F1F3}\\x{1F1FA}]"
"  |  \\x{1F1FC} [\\x{1F1EB}\\x{1F1F8}]"
"  |  \\x{1F1FD} \\x{1F1F0}"
"  |  \\x{1F1FE} [\\x{1F1EA}\\x{1F1F9}]"
"  |  \\x{1F1FF} [\\x{1F1E6}\\x{1F1F2}\\x{1F1FC}]"
"  |  [\\x{1F201}\\x{1F202}\\x{1F21A}\\x{1F22F}\\x{1F232}-\\x{1F23A}\\x{1F250}\\x{1F251}\\x{1F300}-\\x{1F321}\\x{1F324}-\\x{1F384}]"
"  |  \\x{1F385} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F386}-\\x{1F393}\\x{1F396}\\x{1F397}\\x{1F399}-\\x{1F39B}\\x{1F39E}-\\x{1F3C1}]"
"  |  \\x{1F3C2} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F3C3}\\x{1F3C4}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F3C5}\\x{1F3C6}]"
"  |  \\x{1F3C7} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F3C8}\\x{1F3C9}]"
"  |  \\x{1F3CA}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F3CB}\\x{1F3CC}]"
"     (?:"
"          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F3CD}-\\x{1F3F0}]"
"  |  \\x{1F3F3}"
"     (?: \\x{FE0F} \\x{200D} \\x{1F308} )?"
"  |  \\x{1F3F4}"
"     (?:"
"          \\x{200D} \\x{2620} \\x{FE0F}"
"       |  \\x{E0067} \\x{E0062}"
"          (?:"
"               \\x{E0065} \\x{E006E} \\x{E0067}"
"            |  \\x{E0073} \\x{E0063} \\x{E0074}"
"            |  \\x{E0077} \\x{E006C} \\x{E0073}"
"          )"
"          \\x{E007F}"
"     )?"
"  |  [\\x{1F3F5}\\x{1F3F7}-\\x{1F440}]"
"  |  \\x{1F441}"
"     (?: \\x{FE0F} \\x{200D} \\x{1F5E8} \\x{FE0F} )?"
"  |  [\\x{1F442}\\x{1F443}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F444}\\x{1F445}]"
"  |  [\\x{1F446}-\\x{1F450}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F451}-\\x{1F465}]"
"  |  [\\x{1F466}\\x{1F467}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F468}"
"     (?:"
"          \\x{200D}"
"          (?:"
"               [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"            |  \\x{2764} \\x{FE0F} \\x{200D}"
"               (?: \\x{1F48B} \\x{200D} )?"
"               \\x{1F468}"
"            |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
"            |  \\x{1F466}"
"               (?: \\x{200D} \\x{1F466} )?"
"            |  \\x{1F467}"
"               (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"            |  [\\x{1F468}\\x{1F469}] \\x{200D}"
"               (?:"
"                    \\x{1F466}"
"                    (?: \\x{200D} \\x{1F466} )?"
"                 |  \\x{1F467}"
"                    (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"               )"
"            |  [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"          )"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?:"
"               \\x{200D}"
"               (?:"
"                    [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"                 |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"               )"
"          )?"
"     )?"
"  |  \\x{1F469}"
"     (?:"
"          \\x{200D}"
"          (?:"
"               [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"            |  \\x{2764} \\x{FE0F} \\x{200D}"
"               (?: \\x{1F48B} \\x{200D} )?"
"               [\\x{1F468}\\x{1F469}]"
"            |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
"            |  \\x{1F466}"
"               (?: \\x{200D} \\x{1F466} )?"
"            |  \\x{1F467}"
"               (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"            |  \\x{1F469} \\x{200D}"
"               (?:"
"                    \\x{1F466}"
"                    (?: \\x{200D} \\x{1F466} )?"
"                 |  \\x{1F467}"
"                    (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"               )"
"            |  [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"          )"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?:"
"               \\x{200D}"
"               (?:"
"                    [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"                 |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"               )"
"          )?"
"     )?"
"  |  [\\x{1F46A}-\\x{1F46D}]"
"  |  \\x{1F46E}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F46F}"
"     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"  |  \\x{1F470} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F471}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F472} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F473}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F474}-\\x{1F476}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F477}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F478} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F479}-\\x{1F47B}]"
"  |  \\x{1F47C} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F47D}-\\x{1F480}]"
"  |  [\\x{1F481}\\x{1F482}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F483} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F484}"
"  |  \\x{1F485} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F486}\\x{1F487}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F488}-\\x{1F4A9}]"
"  |  \\x{1F4AA} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F4AB}-\\x{1F4FD}\\x{1F4FF}-\\x{1F53D}\\x{1F549}-\\x{1F54E}\\x{1F550}-\\x{1F567}\\x{1F56F}\\x{1F570}\\x{1F573}]"
"  |  \\x{1F574} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F575}"
"     (?:"
"          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F576}-\\x{1F579}]"
"  |  \\x{1F57A} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F587}\\x{1F58A}-\\x{1F58D}]"
"  |  [\\x{1F590}\\x{1F595}\\x{1F596}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F5A4}\\x{1F5A5}\\x{1F5A8}\\x{1F5B1}\\x{1F5B2}\\x{1F5BC}\\x{1F5C2}-\\x{1F5C4}\\x{1F5D1}-\\x{1F5D3}\\x{1F5DC}-\\x{1F5DE}\\x{1F5E1}\\x{1F5E3}\\x{1F5E8}\\x{1F5EF}\\x{1F5F3}\\x{1F5FA}-\\x{1F644}]"
"  |  [\\x{1F645}-\\x{1F647}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F648}-\\x{1F64A}]"
"  |  \\x{1F64B}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F64C} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F64D}\\x{1F64E}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F64F} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F680}-\\x{1F6A2}]"
"  |  \\x{1F6A3}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F6A4}-\\x{1F6B3}]"
"  |  [\\x{1F6B4}-\\x{1F6B6}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F6B7}-\\x{1F6BF}]"
"  |  \\x{1F6C0} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F6C1}-\\x{1F6C5}\\x{1F6CB}]"
"  |  \\x{1F6CC} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F6CD}-\\x{1F6D2}\\x{1F6E0}-\\x{1F6E5}\\x{1F6E9}\\x{1F6EB}\\x{1F6EC}\\x{1F6F0}\\x{1F6F3}-\\x{1F6F9}\\x{1F910}-\\x{1F917}]"
"  |  [\\x{1F918}-\\x{1F91C}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F91D}"
"  |  [\\x{1F91E}\\x{1F91F}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F920}-\\x{1F925}]"
"  |  \\x{1F926}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F927}-\\x{1F92F}]"
"  |  [\\x{1F930}-\\x{1F936}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F937}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F938}\\x{1F939}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F93A}"
"  |  \\x{1F93C}"
"     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"  |  [\\x{1F93D}\\x{1F93E}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F940}-\\x{1F945}\\x{1F947}-\\x{1F970}\\x{1F973}-\\x{1F976}\\x{1F97A}\\x{1F97C}-\\x{1F9A2}\\x{1F9B0}-\\x{1F9B4}]"
"  |  [\\x{1F9B5}\\x{1F9B6}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F9B7}"
"  |  [\\x{1F9B8}\\x{1F9B9}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F9C0}-\\x{1F9C2}\\x{1F9D0}]"
"  |  [\\x{1F9D1}-\\x{1F9D5}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F9D6}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F9D7}-\\x{1F9DD}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F9DE}\\x{1F9DF}]"
"     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"  |  [\\x{1F9E0}-\\x{1F9FF}]"

For utf-16 mode (stringed), compressed mode :

"[#*0-9]\\uFE0F\\u20E3|[\\u00A9\\u00AE\\u203C\\u2049\\u2122\\u2139\\u2"
"194-\\u2199\\u21A9\\u21AA\\u231A\\u231B\\u2328\\u23CF\\u23E9-\\u23F3\\"
"u23F8-\\u23FA\\u24C2\\u25AA\\u25AB\\u25B6\\u25C0\\u25FB-\\u25FE\\u260"
"0-\\u2604\\u260E\\u2611\\u2614\\u2615\\u2618]|\\u261D(?:\\uD83C[\\uDF"
"FB-\\uDFFF])?|[\\u2620\\u2622\\u2623\\u2626\\u262A\\u262E\\u262F\\u26"
"38-\\u263A\\u2640\\u2642\\u2648-\\u2653\\u265F\\u2660\\u2663\\u2665\\u"
"2666\\u2668\\u267B\\u267E\\u267F\\u2692-\\u2697\\u2699\\u269B\\u269C\\"
"u26A0\\u26A1\\u26AA\\u26AB\\u26B0\\u26B1\\u26BD\\u26BE\\u26C4\\u26C5\\"
"u26C8\\u26CE\\u26CF\\u26D1\\u26D3\\u26D4\\u26E9\\u26EA\\u26F0-\\u26F5"
"\\u26F7\\u26F8]|\\u26F9(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640"
"\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\u26FA\\u"
"26FD\\u2702\\u2705\\u2708\\u2709]|[\\u270A-\\u270D](?:\\uD83C[\\uDFF"
"B-\\uDFFF])?|[\\u270F\\u2712\\u2714\\u2716\\u271D\\u2721\\u2728\\u273"
"3\\u2734\\u2744\\u2747\\u274C\\u274E\\u2753-\\u2755\\u2757\\u2763\\u27"
"64\\u2795-\\u2797\\u27A1\\u27B0\\u27BF\\u2934\\u2935\\u2B05-\\u2B07\\u"
"2B1B\\u2B1C\\u2B50\\u2B55\\u3030\\u303D\\u3297\\u3299]|\\uD83C(?:[\\u"
"DC04\\uDCCF\\uDD70\\uDD71\\uDD7E\\uDD7F\\uDD8E\\uDD91-\\uDD9A]|\\uDDE"
"6\\uD83C[\\uDDE8-\\uDDEC\\uDDEE\\uDDF1\\uDDF2\\uDDF4\\uDDF6-\\uDDFA\\u"
"DDFC\\uDDFD\\uDDFF]|\\uDDE7\\uD83C[\\uDDE6\\uDDE7\\uDDE9-\\uDDEF\\uDD"
"F1-\\uDDF4\\uDDF6-\\uDDF9\\uDDFB\\uDDFC\\uDDFE\\uDDFF]|\\uDDE8\\uD83C"
"[\\uDDE6\\uDDE8\\uDDE9\\uDDEB-\\uDDEE\\uDDF0-\\uDDF5\\uDDF7\\uDDFA-\\u"
"DDFF]|\\uDDE9\\uD83C[\\uDDEA\\uDDEC\\uDDEF\\uDDF0\\uDDF2\\uDDF4\\uDDF"
"F]|\\uDDEA\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDED\\uDDF7-\\uDDFA]"
"|\\uDDEB\\uD83C[\\uDDEE-\\uDDF0\\uDDF2\\uDDF4\\uDDF7]|\\uDDEC\\uD83C["
"\\uDDE6\\uDDE7\\uDDE9-\\uDDEE\\uDDF1-\\uDDF3\\uDDF5-\\uDDFA\\uDDFC\\uD"
"DFE]|\\uDDED\\uD83C[\\uDDF0\\uDDF2\\uDDF3\\uDDF7\\uDDF9\\uDDFA]|\\uDD"
"EE\\uD83C[\\uDDE8-\\uDDEA\\uDDF1-\\uDDF4\\uDDF6-\\uDDF9]|\\uDDEF\\uD8"
"3C[\\uDDEA\\uDDF2\\uDDF4\\uDDF5]|\\uDDF0\\uD83C[\\uDDEA\\uDDEC-\\uDDE"
"E\\uDDF2\\uDDF3\\uDDF5\\uDDF7\\uDDFC\\uDDFE\\uDDFF]|\\uDDF1\\uD83C[\\u"
"DDE6-\\uDDE8\\uDDEE\\uDDF0\\uDDF7-\\uDDFB\\uDDFE]|\\uDDF2\\uD83C[\\uD"
"DE6\\uDDE8-\\uDDED\\uDDF0-\\uDDFF]|\\uDDF3\\uD83C[\\uDDE6\\uDDE8\\uDD"
"EA-\\uDDEC\\uDDEE\\uDDF1\\uDDF4\\uDDF5\\uDDF7\\uDDFA\\uDDFF]|\\uDDF4\\"
"uD83C\\uDDF2|\\uDDF5\\uD83C[\\uDDE6\\uDDEA-\\uDDED\\uDDF0-\\uDDF3\\uD"
"DF7-\\uDDF9\\uDDFC\\uDDFE]|\\uDDF6\\uD83C\\uDDE6|\\uDDF7\\uD83C[\\uDD"
"EA\\uDDF4\\uDDF8\\uDDFA\\uDDFC]|\\uDDF8\\uD83C[\\uDDE6-\\uDDEA\\uDDEC"
"-\\uDDF4\\uDDF7-\\uDDF9\\uDDFB\\uDDFD-\\uDDFF]|\\uDDF9\\uD83C[\\uDDE6"
"\\uDDE8\\uDDE9\\uDDEB-\\uDDED\\uDDEF-\\uDDF4\\uDDF7\\uDDF9\\uDDFB\\uDD"
"FC\\uDDFF]|\\uDDFA\\uD83C[\\uDDE6\\uDDEC\\uDDF2\\uDDF3\\uDDF8\\uDDFE\\"
"uDDFF]|\\uDDFB\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDEE\\uDDF3\\uDD"
"FA]|\\uDDFC\\uD83C[\\uDDEB\\uDDF8]|\\uDDFD\\uD83C\\uDDF0|\\uDDFE\\uD8"
"3C[\\uDDEA\\uDDF9]|\\uDDFF\\uD83C[\\uDDE6\\uDDF2\\uDDFC]|[\\uDE01\\uD"
"E02\\uDE1A\\uDE2F\\uDE32-\\uDE3A\\uDE50\\uDE51\\uDF00-\\uDF21\\uDF24-"
"\\uDF84]|\\uDF85(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDF86-\\uDF93\\uDF9"
"6\\uDF97\\uDF99-\\uDF9B\\uDF9E-\\uDFC1]|\\uDFC2(?:\\uD83C[\\uDFFB-\\u"
"DFFF])?|[\\uDFC3\\uDFC4](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\"
"uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDFC5\\uDFC6"
"]|\\uDFC7(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDFC8\\uDFC9]|\\uDFCA(?:\\"
"u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2"
"640\\u2642]\\uFE0F)?)?|[\\uDFCB\\uDFCC](?:\\uD83C[\\uDFFB-\\uDFFF]("
"?:\\u200D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uF"
"E0F)?|[\\uDFCD-\\uDFF0]|\\uDFF3(?:\\uFE0F\\u200D\\uD83C\\uDF08)?|\\u"
"DFF4(?:\\u200D\\u2620\\uFE0F|\\uDB40\\uDC67\\uDB40\\uDC62\\uDB40(?:\\"
"uDC65\\uDB40\\uDC6E\\uDB40\\uDC67|\\uDC73\\uDB40\\uDC63\\uDB40\\uDC74"
"|\\uDC77\\uDB40\\uDC6C\\uDB40\\uDC73)\\uDB40\\uDC7F)?|[\\uDFF5\\uDFF7"
"-\\uDFFF])|\\uD83D(?:[\\uDC00-\\uDC40]|\\uDC41(?:\\uFE0F\\u200D\\uD8"
"3D\\uDDE8\\uFE0F)?|[\\uDC42\\uDC43](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\"
"uDC44\\uDC45]|[\\uDC46-\\uDC50](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDC"
"51-\\uDC65]|[\\uDC66\\uDC67](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC68(?"
":\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83"
"D(?:\\uDC8B\\u200D\\uD83D)?\\uDC68|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDF"
"A4\\uDFA8\\uDFEB\\uDFED]|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?"
"|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?|[\\uDC68\\uDC69]\\u200D\\"
"uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83D["
"\\uDC66\\uDC67])?)|[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD"
"83E[\\uDDB0-\\uDDB3])|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D(?:[\\u2695"
"\\u2696\\u2708]\\uFE0F|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uD"
"FEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD8"
"3E[\\uDDB0-\\uDDB3]))?)?|\\uDC69(?:\\u200D(?:[\\u2695\\u2696\\u2708"
"]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83D(?:\\uDC8B\\u200D\\uD83D)?[\\uDC"
"68\\uDC69]|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]"
"|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83"
"D[\\uDC66\\uDC67])?|\\uDC69\\u200D\\uD83D(?:\\uDC66(?:\\u200D\\uD83D"
"\\uDC66)?|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?)|[\\uDCBB\\uDCB"
"C\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD83E[\\uDDB0-\\uDDB3])|\\uD83C[\\u"
"DFFB-\\uDFFF](?:\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\uD83C[\\u"
"DF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCB"
"C\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD83E[\\uDDB0-\\uDDB3]))?)?|[\\uDC6"
"A-\\uDC6D]|\\uDC6E(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
"\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC6F(?:\\u200D[\\u2"
"640\\u2642]\\uFE0F)?|\\uDC70(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC71(?"
":\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\"
"u2640\\u2642]\\uFE0F)?)?|\\uDC72(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC"
"73(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
"0D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC74-\\uDC76](?:\\uD83C[\\uDFFB-\\"
"uDFFF])?|\\uDC77(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
"uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC78(?:\\uD83C[\\uDF"
"FB-\\uDFFF])?|[\\uDC79-\\uDC7B]|\\uDC7C(?:\\uD83C[\\uDFFB-\\uDFFF])"
"?|[\\uDC7D-\\uDC80]|[\\uDC81\\uDC82](?:\\u200D[\\u2640\\u2642]\\uFE0"
"F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uD"
"C83(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC84|\\uDC85(?:\\uD83C[\\uDFFB-"
"\\uDFFF])?|[\\uDC86\\uDC87](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C"
"[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC88-\\uD"
"CA9]|\\uDCAA(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDCAB-\\uDCFD\\uDCFF-\\"
"uDD3D\\uDD49-\\uDD4E\\uDD50-\\uDD67\\uDD6F\\uDD70\\uDD73]|\\uDD74(?:"
"\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD75(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
"00D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?"
"|[\\uDD76-\\uDD79]|\\uDD7A(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD87\\uD"
"D8A-\\uDD8D]|[\\uDD90\\uDD95\\uDD96](?:\\uD83C[\\uDFFB-\\uDFFF])?|["
"\\uDDA4\\uDDA5\\uDDA8\\uDDB1\\uDDB2\\uDDBC\\uDDC2-\\uDDC4\\uDDD1-\\uDD"
"D3\\uDDDC-\\uDDDE\\uDDE1\\uDDE3\\uDDE8\\uDDEF\\uDDF3\\uDDFA-\\uDE44]|"
"[\\uDE45-\\uDE47](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
"uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE48-\\uDE4A]|\\uDE"
"4B(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
"0D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4C(?:\\uD83C[\\uDFFB-\\uDFFF])?|"
"[\\uDE4D\\uDE4E](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\u"
"DFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4F(?:\\uD83C[\\uDFF"
"B-\\uDFFF])?|[\\uDE80-\\uDEA2]|\\uDEA3(?:\\u200D[\\u2640\\u2642]\\uF"
"E0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|["
"\\uDEA4-\\uDEB3]|[\\uDEB4-\\uDEB6](?:\\u200D[\\u2640\\u2642]\\uFE0F|"
"\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE"
"B7-\\uDEBF]|\\uDEC0(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDEC1-\\uDEC5\\u"
"DECB]|\\uDECC(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDECD-\\uDED2\\uDEE0-"
"\\uDEE5\\uDEE9\\uDEEB\\uDEEC\\uDEF0\\uDEF3-\\uDEF9])|\\uD83E(?:[\\uDD"
"10-\\uDD17]|[\\uDD18-\\uDD1C](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD1D|"
"[\\uDD1E\\uDD1F](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD20-\\uDD25]|\\uD"
"D26(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
"00D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDD27-\\uDD2F]|[\\uDD30-\\uDD36]("
"?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD37(?:\\u200D[\\u2640\\u2642]\\uFE0"
"F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\u"
"DD38\\uDD39](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFF"
"F](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDD3A|\\uDD3C(?:\\u200D[\\"
"u2640\\u2642]\\uFE0F)?|[\\uDD3D\\uDD3E](?:\\u200D[\\u2640\\u2642]\\u"
"FE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|"
"[\\uDD40-\\uDD45\\uDD47-\\uDD70\\uDD73-\\uDD76\\uDD7A\\uDD7C-\\uDDA2\\"
"uDDB0-\\uDDB4]|[\\uDDB5\\uDDB6](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDB"
"7|[\\uDDB8\\uDDB9](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
"\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDDC0-\\uDDC2\\uDDD"
"0]|[\\uDDD1-\\uDDD5](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDD6(?:\\u200D"
"[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u"
"2642]\\uFE0F)?)?|[\\uDDD7-\\uDDDD](?:\\u200D[\\u2640\\u2642]\\uFE0F"
"|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uD"
"DDE\\uDDDF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\uDDE0-\\uDDFF])"



回答14:


Regex is too slow, and Emoji is updated very fast.

Try this project simple-emoji-4j

Compatible with Emoji 12.0 (2018.10.15)

Simple with:

EmojiUtils.containsEmoji(str)


来源:https://stackoverflow.com/questions/24840667/what-is-the-regex-to-extract-all-the-emojis-from-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!