Given a unicode character what would be the simplest way to return its script (as \"Latin\", \"Hangul\" etc)? unicodedata doesn\'t seem to provide this kind of feature.
Oftentimes it is just enough to detect if a certain script is used, and then you can use the unicodedata.name with prefix matching. For example to find out whether a letter is Cyrillic, you can use
class CharacterNamePrefixTester(dict):
def __init__(self, prefix):
self.prefix = prefix
def __missing__(self, key):
self[key] = unicodedata.name(key, '').startswith(self.prefix)
return self[key]
>>> cyrillic = CharaterNamePrefixTester('CYRILLIC ')
>>> cyrillic['й']
True
>>> cyrillic['a']
False
The dictionary is built lazily but the truth values are memoized so that future lookups of the same letter will be faster.