Is there a standard way to sort by a non-english alphabet? For example, the romanian alphabet is “a ă â b c…” [duplicate]

倾然丶 夕夏残阳落幕 提交于 2021-02-08 15:55:16

问题


Possible Duplicate:
How do I sort unicode strings alphabetically in Python?

As a citizen of the Rest-of-the-World, I'm really annoyed by the fact that computers aren't adapted by default to deal with international issues. Many sites still don't use Unicode and PHP is still in the Dark Ages.

When I want to sort a list of words or names in romanian I always have to write my own functions, which are hardly efficient. There must be some locale setting that makes sort functions obey the alphabet order of the specified language, right?

I'm mainly interested in Python, Java and JavaScript.

EDIT: I found my answer for Python here, as pointed out by Chris Morgan.


回答1:


In Python, you can always use sorted function with a key parameter. For example, in Turkish, we have letters like 'ç','ı','ş' etc. If I want to sort according to that letter, I would use a key string which letters is sorted, and sort the string according to this, like this:

>>> letters="abcçdefgğhıijklmnoöprsştuüvyz" #Turkish alphabet
>>> sorted("açobzöğge")
['a', 'b', 'e', 'g', 'o', 'z', 'ç', 'ö', 'ğ'] #Python's default
>>> sorted("açobzöğge", key=lambda i: letters.index(i))
['a', 'b', 'ç', 'e', 'g', 'ğ', 'o', 'ö', 'z'] #With key parameter

Note: With Python 3; dealing with Unicode is easier.

Edit, as said by comments, this process would be more efficent if we use a dictionary:

>>> letters="abcçdefgğhıijklmnoöprsştuüvyz"
>>> d={i:letters.index(i) for i in letters}
>>> sorted("açobzöğge", key=d.get)
['a', 'b', 'ç', 'e', 'g', 'ğ', 'o', 'ö', 'z']



回答2:


There is no single, unified sorting algorithm that's correct for all languages, because many languages have very specific sorting rules.

It goes even further than that: even within a single language, the sorting algorithm can vary depending on what it's used for (for example in German dictionaries are sorted slightly different from phone books).

The entire topic is called Collation. The Wikipedia article on Collating sequence is relevant as well.

There seems to be a project that implements correct collation for many languages called python-collate.



来源:https://stackoverflow.com/questions/6056842/is-there-a-standard-way-to-sort-by-a-non-english-alphabet-for-example-the-roma

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!