Sort dictionary by key using locale/collation

隐身守侯 提交于 2021-02-06 15:48:30

问题


The following code is ignoring the locale and Égypt goes at the end, what's wrong?

dict = {"United States": "United States", "Spain" : "Spain", "England": "England", "Égypt": "Égypt"}

import locale

# using your default locale (user settings)
locale.setlocale(locale.LC_ALL,"fr_FR")
print OrderedDict(sorted(dict.items(), key=lambda t: t[0], cmp=locale.strcoll))

That is the output:

OrderedDict([('England', 'England'), ('Spain', 'Spain'), ('United States', 'United States'), ('\xc3\x89gypt', '\xc3\x89gypt')])

回答1:


Consider the following...

import unicodedata
from collections import OrderedDict
dict = {"United States": "United States", "Spain" : "Spain", "England": "England", "Égypt": "Égypt"}

import locale

# using your default locale (user settings)
locale.setlocale(locale.LC_ALL,"fr_FR")

print OrderedDict(sorted(dict.items(),cmp= lambda a,b: locale.strcoll(unicodedata.normalize('NFD', unicode(a)[0]).encode('ASCII', 'ignore'),
                                                                       unicodedata.normalize('NFD', unicode(b)[0]).encode('ASCII', 'ignore'))))



回答2:


Here's a work-around.

Use unicode's normalization form canonical decomposition http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

# utf-8 <-> unicode is left as exercise to the reader
egypt = unicodedata.normalize("NFD", egypt)

sorted(['Egypt', 'E\xcc\x81gypt', 'US'])
['Egypt', 'E\xcc\x81gypt', 'US']

This doesn't actually take locale into consideration.

Beyond this, try newer Python (yes I know) or ICU library from Martijn's linked question and respective answers.



来源:https://stackoverflow.com/questions/22203550/sort-dictionary-by-key-using-locale-collation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!