Using collections.Counter to count emojis with different colors

喜你入骨 提交于 2019-11-30 18:15:48

问题


I would like to use the collections.Counter class to count emojis in a string. It generally works fine, however, when I introduce colored emojis the color component of the emoji is separated from the emoji like so:

>>> import collections
>>> emoji_string = "👌🏻👌🏼👌🏽👌🏾👌🏿"
>>> emoji_counter = collections.Counter(emoji_string)
>>> emoji_counter.most_common()
[('👌', 5), ('🏻', 1), ('🏼', 1), ('🏽', 1), ('🏾', 1), ('🏿', 1)]

How can I make the most_common() function return something like this instead:

[('👌🏻', 1), ('👌🏼', 1), ('👌🏽', 1), ('👌🏾', 1), ('👌🏿', 1)]

I'm using Python 3.6


回答1:


You'll have to split your string into separate clusters. Each of your emoji is really two codepoints; the emoji and a EMOJI MODIFIER FITZPATRICK TYPE X codepoint:

>>> print(emoji_string[0])
👌
>>> print(emoji_string[1])
🏻
>>> print(emoji_string[:2])
👌🏻
>>> print(ascii(emoji_string[:2]))
'\U0001f44c\U0001f3fb'
>>> import unicodedata
>>> unicodedata.name(emoji_string[1])
'EMOJI MODIFIER FITZPATRICK TYPE-1-2'

You could use a regular expression to keep those with the preceding emoji:

import re

char_with_modifier = re.compile(r'(.[\U0001f3fb-\U0001f3ff]?)')
split_emoji = char_with_modifier.findall(emoji_string)

and count the result.

Demo:

>>> import re
>>> from collections import Counter
>>> emoji_string = "👌🏻👌🏼👌🏽👌🏾👌🏿"
>>> char_with_modifier = re.compile(r'(.[\U0001f3fb-\U0001f3ff]?)')
>>> Counter(char_with_modifier.findall(emoji_string))
Counter({'👌🏻': 1, '👌🏼': 1, '👌🏽': 1, '👌🏾': 1, '👌🏿': 1})


来源:https://stackoverflow.com/questions/43852668/using-collections-counter-to-count-emojis-with-different-colors

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!