Python not sorting unicode properly. Strcoll doesn't help

后端 未结 6 1009
执笔经年
执笔经年 2020-11-30 04:05

I\'ve got a problem with sorting lists using unicode collation in Python 2.5.1 and 2.6.5 on OSX, as well as on Linux.

import locale   
locale.setlocale(loca         


        
6条回答
  •  萌比男神i
    2020-11-30 04:51

    Just to add to tkopczuk's investigation: This is definitely a gcc bug, at least for version 4.2.1 on OS X 10.6.4. It can be reproduced by calling C strcoll() directly as in this snippet.

    EDIT: Still on the same system, I find that for the UTF-8 versions of de_DE, fr_FR, pl_PL, the problem is there, but for the ISO-88591 versions of fr_FR and de_DE, sort order is correct. Unfortunately for the OP, ISO-88592 pl_PL is also buggy:

    The order for Polish ISO-8859 is:
    LATIN SMALL LETTER A
    LATIN SMALL LETTER Z
    LATIN SMALL LETTER A WITH OGONEK
    The LC_COLLATE culture and encoding settings were pl_PL, ISO8859-2.
    
    The order for Polish Unicode is:
    LATIN SMALL LETTER A
    LATIN SMALL LETTER Z
    LATIN SMALL LETTER A WITH OGONEK
    The LC_COLLATE culture and encoding settings were pl_PL, UTF8.
    
    The order for German Unicode is:
    LATIN SMALL LETTER A
    LATIN SMALL LETTER Z
    LATIN SMALL LETTER A WITH DIAERESIS
    The LC_COLLATE culture and encoding settings were de_DE, UTF8.
    
    The order for German ISO-8859 is:
    LATIN SMALL LETTER A
    LATIN SMALL LETTER A WITH DIAERESIS
    LATIN SMALL LETTER Z
    The LC_COLLATE culture and encoding settings were de_DE, ISO8859-1.
    
    The order for Fremch ISO-8859 is:
    LATIN SMALL LETTER A
    LATIN SMALL LETTER E WITH ACUTE
    LATIN SMALL LETTER Z
    The LC_COLLATE culture and encoding settings were fr_FR, ISO8859-1.
    
    The order for French Unicode is:
    LATIN SMALL LETTER A
    LATIN SMALL LETTER Z
    LATIN SMALL LETTER E WITH ACUTE
    The LC_COLLATE culture and encoding settings were fr_FR, UTF8.
    

提交回复
热议问题