How can I split a string containing emoji into an array?

后端 未结 6 539
慢半拍i
慢半拍i 2020-12-01 08:12

(You\'ll need Firefox or Safari to see the emoji in the code.)

I want to take a string of emoji and do something with the individual characters.

In JavaScript

相关标签:
6条回答
  • 2020-12-01 08:37

    Edit: see Orlin Georgiev's answer for a proper solution in a library: https://github.com/orling/grapheme-splitter


    Thanks to this answer I made a function that takes a string and returns an array of emoji:

    var emojiStringToArray = function (str) {
      split = str.split(/([\uD800-\uDBFF][\uDC00-\uDFFF])/);
      arr = [];
      for (var i=0; i<split.length; i++) {
        char = split[i]
        if (char !== "") {
          arr.push(char);
        }
      }
      return arr;
    };
    

    So

    emojiStringToArray("                                                                    
    0 讨论(0)
  • 2020-12-01 08:38

    It can be done using the u flag of a regular expression. The regular expression is:

    /.*?/u
    

    This is broken every time there are there are at least minimally zero or more characters that may or may not be emojis, but cannot be spaces or new lines break.

    • There are at least minimally zero or more: ? (split in zero chars)
    • Zero or more: *
    • Cannot be spaces or new line break: .
    • May or may not be emojis: /u

    By using the question mark ? I am forcing to cut exactly every zero chars, otherwise /.*/u it cuts by all characters until I find a space or newline break.

    var string = "                                                                    
    0 讨论(0)
  • 2020-12-01 08:39

    The grapheme-splitter library that does just that, is fully compatible even with old browsers and works not just with emoji but all sorts of exotic characters: https://github.com/orling/grapheme-splitter You are likely to miss edge-cases in any home-brew solution. This one is actually based on the UAX-29 Unicode standart

    0 讨论(0)
  • 2020-12-01 08:41

    JavaScript ES6 has a solution!, for a real split:

    [..."                                                                    
    0 讨论(0)
  • 2020-12-01 08:41

    The modern / proper way to split a UTF8 string is using Array.from(str) instead of str.split('')

    0 讨论(0)
  • 2020-12-01 08:44

    The Grapheme Splitter library by Orlin Georgiev is pretty amazing.

    Although it hasn't been updated in a while and presently (Sep 2020) it only supports Unicode 10 and below.

    For an updated version of Grapheme Splitter built in Typescript with Unicode 13 support have a look at: https://github.com/flmnt/graphemer

    Here is a quick example:

    import Graphemer from 'graphemer';
    
    const splitter = new Graphemer();
    
    const string = "                                                                    
    0 讨论(0)
提交回复
热议问题