Python emoji search and replace not working as expected

落花浮王杯 提交于 2019-12-02 00:48:01

There are several issues here.

  • There is no capturing groups in the regex pattern, but in the replacement pattern, you define \1 backreference to Group 1 - so, the most natural workaround is to use a backreference to Group 0, i.e. the whole match, that is \g<0>.
  • The \1 in the replacement is not actually parsed as a backreference, but as a a char with an octal value 1 because the backslash in the regular (not raw) string literals forms escape sequences. Here, it is an octal escape.
  • The + after the ] means that the regex engine must match 1 or more occurrences of text matching the character class, so you match sequences of emojis rather than each separate emoji.

Use

import re

text = "I am very #happy man but😘😘 my wife😞 is not 😊😘"
print(text) #line a

reg = re.compile(u'['
    u'\U0001F300-\U0001F64F'
    u'\U0001F680-\U0001F6FF'
    u'\u2600-\u26FF\u2700-\u27BF]', 
    re.UNICODE)

#padding the emoji with space at both ends
new_text = reg.sub(r' \g<0> ',text) 
print(new_text) #line b

# this is just to test if it can still identify the emojis in new_text
new_text2 = reg.sub(r'#\g<0>#', new_text) 
print(new_text2) # line c

See the Python demo printing

I am very #happy man but😘😘 my wife😞 is not 😊😘
I am very #happy man but 😘  😘  my wife 😞  is not  😊  😘 
I am very #happy man but #😘#  #😘#  my wife #😞#  is not  #😊#  #😘# 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!