Python - replace unicode emojis with ASCII characters

对着背影说爱祢 提交于 2019-12-04 11:29:45

With the tip about unicodedata.name and some further research I managed to put this thing together:

import unicodedata
from unidecode import unidecode

def deEmojify(inputString):
    returnString = ""

    for character in inputString:
        try:
            character.encode("ascii")
            returnString += character
        except UnicodeEncodeError:
            replaced = unidecode(str(character))
            if replaced != '':
                returnString += replaced
            else:
                try:
                     returnString += "[" + unicodedata.name(character) + "]"
                except ValueError:
                     returnString += "[x]"

    return returnString

Basically it first tries to find the most appropriate ascii representation, if that fails it tries using the unicode name, and if even that fails it simply replaces it with some simple marker.

For example Taking this string:

abcdšeđfčgžhÅiØjÆk 可爱!!!!!!!!😍😍😍😍😍😍😍😝

And running the function:

string = u'abcdšeđfčgžhÅiØjÆk \u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'
print(deEmojify(string))

Will produce the following result:

abcdsedfcgzhAiOjAEk[x] Ke Ai !!!!!!!![SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][FACE WITH STUCK-OUT TONGUE AND TIGHTLY-CLOSED EYES]

Try this

import unicodedata
print( unicodedata.name(u'\U0001f60d'))

result is

SMILING FACE WITH HEART-SHAPED EYES
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!