Find out the unicode script of a character

前端 未结 5 923
既然无缘
既然无缘 2020-12-09 16:45

Given a unicode character what would be the simplest way to return its script (as \"Latin\", \"Hangul\" etc)? unicodedata doesn\'t seem to provide this kind of feature.

5条回答
  •  一生所求
    2020-12-09 17:29

    It seems to me that the Python unicodedata module contains tools for accessing the main file in the Unicode database but nothing for the other files: “The data in this database is based on the UnicodeData.txt file”

    The script information is in the Scripts.txt file. It is of relatively simple format (described in UAX #44) and not horribly large (131 kilobytes), so you might consider parsing it in your program. Note that in the Unicode classification, there’s the “Common” script that contains characters used in different scripts, like punctuation marks.

提交回复
热议问题