Match unicode emoji in python regex

前端 未结 3 1199
时光说笑
时光说笑 2021-01-18 02:49

I need to extract the text between a number and an emoticon in a text

example text:

blah xzuyguhbc ibcbb bqw 2 extract1  ☺️ jbjhcb 6 extract2          


        
3条回答
  •  天命终不由人
    2021-01-18 03:30

    Since there are a lot of emoji with different unicode values, you have to explicitly specify them in your regex, or if they are with a spesific range you can use a character class. In this case your second simbol is not a standard emoji, it's just a unicode character, but since it's greater than \u263a (the unicode representation of ☺️) you can put it in a range with \u263a:

    In [71]: s = 'blah xzuyguhbc ibcbb bqw 2 extract1  ☺️ jbjhcb 6 extract2 

提交回复
热议问题