replace emoji unicode symbol using regexp in javascript

前端 未结 9 1004
醉酒成梦
醉酒成梦 2020-12-15 10:42

As you all know emoji symbols are coded up to 3 or 4 bytes, so it may occupy 2 symbols in my string. For example \'

9条回答
  •  情书的邮戳
    2020-12-15 11:09

    This is somewhat old, but I was looking into this problem and it seems Bradley Momberger has posted a nice solution to it here: http://airhadoken.github.io/2015/04/22/javascript-string-handling-emoji.html

    The regex he proposes is:

    /[\uD800-\uDFFF]./ // This matches emoji
    

    This regex matches the head surrogate, which is used by emojis, and the charracter following the head surrogate (which is assumed to be the tail surrogate). Thus, all emojis should be matched correctly and with

    .replace(/[\uD800-\uDFFF]./g,'')
    

    you should be able to remove all emojis.

    Edit: Better regex found. The above regex misses some emojis.

    But there is a reddit post with a version, for which i cannot find an emoji, that is excepted from the rule. The reddit is here: https://www.reddit.com/r/tasker/comments/4vhf2f/how_to_regex_emojis_in_tasker_for_search_match_or/ And the regex is:

    /[\uD83C-\uDBFF\uDC00-\uDFFF]+/
    

    To match all occurences, use the g modifier:

    /[\uD83C-\uDBFF\uDC00-\uDFFF]+/g
    

    Second Edit: As CodeToad pointed out correctly, ✨ is not recognized by the above Regex, because it's in the dingbats block (thanks to air_hadoken).

    The lodash library came up with an excellent Emoji Regex block:

    (?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|\ud83c[\udffb-\udfff])?(?:\u200d(?:[^\ud800-\udfff]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|\ud83c[\udffb-\udfff])?)*
    

    Kevin Scott nicely put together, what this regex covers in his Blog Post. Spoiler: it includes dingbats

提交回复
热议问题