Convert unicode small capitals to their ASCII equivalents

大憨熊 提交于 2021-02-08 10:22:56

问题


I have the following dataset

'Fʀɪᴇɴᴅ',
 'ᴍᴏᴍ',
 'ᴍᴀᴋᴇs',
 'ʜᴏᴜʀʟʏ',
 'ᴛʜᴇ',
 'ᴄᴏᴍᴘᴜᴛᴇʀ',
 'ʙᴇᴇɴ',
 'ᴏᴜᴛ',
 'ᴀ',
 'ᴊᴏʙ',
 'ғᴏʀ',
 'ᴍᴏɴᴛʜs',
 'ʙᴜᴛ',
 'ʟᴀsᴛ',
 'ᴍᴏɴᴛʜ',
 'ʜᴇʀ',
 'ᴄʜᴇᴄᴋ',
 'ᴊᴜsᴛ',
 'ᴡᴏʀᴋɪɴɢ',
 'ғᴇᴡ',
 'ʜᴏᴜʀs',
 'sᴏᴜʀᴄᴇ',

I want then into ASCII format using Python script for example:

Fʀɪᴇɴᴅ - FRIEND
ᴍᴏᴍ - MOM

I have tried encoding decoding but that doesn't work i also have tried this solution. but that doesn't solve my problem.


回答1:


Python doesn't provide a way to directly convert small caps characters to their ASCII equivalents. However it's possible to do this using str.translate.

To use str.translate we need to create a mapping of small caps characters' ordinal values to ASCII characters.

To get the ordinal values, we can construct the name of each character, then get the character from the unicodedata database and call ord on it. Note that there is no small caps 'X' character, and in Python versions before 3.7 small caps 'Q' is not present.

>>> from string import ascii_uppercase
>>> import unicodedata as ud

>>> # Filter out unsupported characters
>>> # Python < 3.7
>>> letters = (x for x in ascii_uppercase if x not in ('Q', 'X'))
>>> # Python >= 3.7
>>> letters = (x for x in ascii_uppercase if x != 'X') 

>>> mapping = {ord(ud.lookup('LATIN LETTER SMALL CAPITAL ' + x)): x for x in letters}

Once we have the mapping we can use it to make a translation table for str.translate, using str.maketrans, then perform the conversions.

>>> # Make as translation table
>>> tt = str.maketrans(mapping)
>>> # Use the table to "translate" strings to their ASCII equivalent.
>>> s = 'ᴍᴏɴᴛʜ'
>>> s.translate(tt)
'MONTH'


来源:https://stackoverflow.com/questions/55717223/convert-unicode-small-capitals-to-their-ascii-equivalents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!