Find out the unicode script of a character

前端 未结 5 926
既然无缘
既然无缘 2020-12-09 16:45

Given a unicode character what would be the simplest way to return its script (as \"Latin\", \"Hangul\" etc)? unicodedata doesn\'t seem to provide this kind of feature.

5条回答
  •  暖寄归人
    2020-12-09 17:30

    Oftentimes it is just enough to detect if a certain script is used, and then you can use the unicodedata.name with prefix matching. For example to find out whether a letter is Cyrillic, you can use

    class CharacterNamePrefixTester(dict):
        def __init__(self, prefix):
            self.prefix = prefix
        def __missing__(self, key):
            self[key] = unicodedata.name(key, '').startswith(self.prefix)
            return self[key]
    
    >>> cyrillic = CharaterNamePrefixTester('CYRILLIC ')
    >>> cyrillic['й']
    True
    >>> cyrillic['a']
    False
    

    The dictionary is built lazily but the truth values are memoized so that future lookups of the same letter will be faster.

提交回复
热议问题