python-re.sub() and unicode
问题 I want to replace all emoji with '' but my regEx doesn't work. For example, content= u'?\u86cb\u767d12\U0001f633\uff0c\u4f53\u6e29\u65e9\u6668\u6b63\u5e38\uff0c\u5348\u540e\u665a\u95f4\u53d1\u70ed\uff0c\u6211\u73b0\u5728\u8be5\u548b\U0001f633?' and I want to replace all the forms like \U0001f633 with '' so I write the code: print re.sub(ur'\\U[0-9a-fA-F]{8}','',content) But it doesn't work. Thanks a lot. 回答1: You won't be able to recognize properly decoded unicode codepoints that way (as