Regex matching emoticons

后端 未结 4 1868
春和景丽
春和景丽 2020-12-10 07:06

We are working on a project where we want users to be able to use both emoji syntax (like :smile:, :heart:, :confused:,:stuck_ou

相关标签:
4条回答
  • 2020-12-10 07:23

    Make a positive look-ahead for a space

    ([\:\<]-?[)(|\\/pP3D])(?:(?=\s))
     |       |      |         |
     |       |      |         |
     |       |      |         |-> match last separating space
     |       |      |-> match last part of the emot
     |       |-> it may have a `-` or not 
     |-> first part of the emoticon
    

    Since you're using javascript, and you don't have access to look arounds:

    /([\:\<]-?[)|\\/pP3D])(\s|$)/g.exec('hi :) ;D');
    

    And then just splice() the resulting array out of its last entry (that's most probably a space)

    0 讨论(0)
  • 2020-12-10 07:28

    I assume these emoticons will commonly be used with spaces before and after. Then \s might be what you're looking for, as it represents a white space.

    Then your regex would become

    \s+(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)\s
    
    0 讨论(0)
  • 2020-12-10 07:33

    Match emoji first (to take care of the :pencil: example) and then check for a terminating whitespace or newline:

    (\:\w+\:|\<[\/\\]?3|[\(\)\\\D|\*\$][\-\^]?[\:\;\=]|[\:\;\=B8][\-\^]?[3DOPp\@\$\*\\\)\(\/\|])(?=\s|[\!\.\?]|$)
    

    This regex matches the following (preferring emoji) returning the match in matching group 1:

    :( :) :P :p :O :3 :| :/ :\ :$ :* :@
    :-( :-) :-P :-p :-O :-3 :-| :-/ :-\ :-$ :-* :-@
    :^( :^) :^P :^p :^O :^3 :^| :^/ :^\ :^$ :^* :^@
    ): (: $: *:
    )-: (-: $-: *-:
    )^: (^: $^: *^:
    <3 </3 <\3
    :smile: :hug: :pencil:
    

    It also supports terminal punctuation as a delimiter in addition to white space.

    You can see more details and test it here: https://regex101.com/r/aM3cU7/4

    0 讨论(0)
  • 2020-12-10 07:33

    You want regex look-arounds regarding spacing. Another answer here suggested a positive look-ahead, though I'd go double-negative:

    (?<!\S)(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)
    

    While JavaScript doesn't support (?<!pattern), look-behind can be mimicked:

    test_string.replace(/(\S)?(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)/,
                        function($0, $1) { return $1 ? $0 : replacement_text; });
    

    All I did was prefix your code with (?<!\S) in front and suffix with(?!\S) in back. The prefix ensures you do not follow a non-whitespace character, so the only valid leading entries are spaces or nothing (start of line). The suffix does the same thing, ensuring you are not followed by a non-whitespace character. See also this more thorough regex walk-through.

    One of the comments to the question itself was suggesting \b (word boundary) markers. I don't recommend these. In fact, this suggestion would do the opposite of what you want; \b:/ will indeed match http:// since there is a word boundary between the p and the :. This kind of reasoning would suggest \B (not a word boundary), e.g. \B:/\B. This is more portable (it works with pretty much all regex parsers while look-arounds do not), and you can choose it in that case, but I prefer the look-arounds.

    0 讨论(0)
提交回复
热议问题