问题
There are many situations where the user's language is not a "latin" script (examples include: Greek, Russian, Chinese). In most of these cases a sorting is done by
- first sorting the special characters and numbers (numbers in local language though...),
- secondly the words in the local language-script
- at the end, any non native characters such as French, English or German "imported" words, in a general utf collation.
Or even more specific for the rest...:
is it possible to select the sort based on script?
Example1: Chinese script first then Latin-Greek-Arabic (or even more...)
Example2: Greek script first then Latin-Arabic-Chinese (or even more...)
What is the most effective and pythonic way to create a sort like any of these? (by «any» I mean either the simple «selected script first» and rest as in unicode sort, or the more complicated «selected script first» and then a specified order for rest of the scripts)
回答1:
Interesting question. Here’s some sample code that classifies strings according to the writing system of the first character.
import unicodedata
words = ["Japanese", # English
"Nihongo", # Japanese, rōmaji
"にほんご", # Japanese, hiragana
"ニホンゴ", # Japanese, katakana
"日本語", # Japanese, kanji
"Японский язык", # Russian
"जापानी भाषा" # Hindi (Devanagari)
]
def wskey(s):
"""Return a sort key that is a tuple (n, s), where n is an int based
on the writing system of the first character, and s is the passed
string. Writing systems not addressed (Devanagari, in this example)
go at the end."""
sort_order = {
# We leave gaps to make later insertions easy
'CJK' : 100,
'HIRAGANA' : 200,
'KATAKANA' : 200, # hiragana and katakana at same level
'CYRILLIC' : 300,
'LATIN' : 400
}
name = unicodedata.name(s[0], "UNKNOWN")
first = name.split()[0]
n = sort_order.get(first, 999999);
return (n, s)
words.sort(key=wskey)
for s in words:
print(s)
In this example, I am sorting hiragana and katakana (the two Japanese syllabaries) at the same level, which means pure-katakana strings will always come after pure-hiragana strings. If we wanted to sort them such that the same syllable (e.g., に and ニ) sorted together, that would be trickier.
来源:https://stackoverflow.com/questions/51360878/how-to-sort-latin-after-local-language-in-python-3