Find out the unicode script of a character

前端未结

关注

 5  926

既然无缘 2020-12-09 16:45

Given a unicode character what would be the simplest way to return its script (as \"Latin\", \"Hangul\" etc)? unicodedata doesn\'t seem to provide this kind of feature.

5条回答

暖寄归人 (楼主)

2020-12-09 17:30
Oftentimes it is just enough to detect if a certain script is used, and then you can use the unicodedata.name with prefix matching. For example to find out whether a letter is Cyrillic, you can use
```
class CharacterNamePrefixTester(dict):
    def __init__(self, prefix):
        self.prefix = prefix
    def __missing__(self, key):
        self[key] = unicodedata.name(key, '').startswith(self.prefix)
        return self[key]

>>> cyrillic = CharaterNamePrefixTester('CYRILLIC ')
>>> cyrillic['й']
True
>>> cyrillic['a']
False
```
The dictionary is built lazily but the truth values are memoized so that future lookups of the same letter will be faster.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...