How to sort an array of UTF-8 strings?

后端 未结 8 793
刺人心
刺人心 2020-11-27 04:32

I currentyl have no clue on how to sort an array which contains UTF-8 encoded strings in PHP. The array comes from a LDAP server so sorting via a database (would be no probl

8条回答
  •  猫巷女王i
    2020-11-27 05:00

    Eventually this problem cannot be solved in a simple way without using recoded strings (UTF-8 → Windows-1252 or ISO-8859-1) as suggested by ΤΖΩΤΖΙΟΥ due to an obvious PHP bug as discovered by Huppie. To summarize the problem, I created the following code snippet which clearly demonstrates that the problem is the strcoll() function when using the 65001 Windows-UTF-8-codepage.

    function traceStrColl($a, $b) {
        $outValue=strcoll($a, $b);
        echo "$a $b $outValue\r\n";
        return $outValue;
    }
    
    $locale=(defined('PHP_OS') && stristr(PHP_OS, 'win')) ? 'German_Germany.65001' : 'de_DE.utf8';
    
    $string="ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜabcdefghijklmnopqrstuvwxyzäöüß";
    $array=array();
    for ($i=0; $i

    The result is:

    string(20) "German_Germany.65001"
    a B 2147483647
    [...]
    array(59) {
      [0]=>
      string(1) "c"
      [1]=>
      string(1) "B"
      [2]=>
      string(1) "s"
      [3]=>
      string(1) "C"
      [4]=>
      string(1) "k"
      [5]=>
      string(1) "D"
      [6]=>
      string(2) "ä"
      [7]=>
      string(1) "E"
      [8]=>
      string(1) "g"
      [...]
    

    The same snippet works on a Linux machine without any problems producing the following output:

    string(10) "de_DE.utf8"
    a B -1
    [...]
    array(59) {
      [0]=>
      string(1) "a"
      [1]=>
      string(1) "A"
      [2]=>
      string(2) "ä"
      [3]=>
      string(2) "Ä"
      [4]=>
      string(1) "b"
      [5]=>
      string(1) "B"
      [6]=>
      string(1) "c"
      [7]=>
      string(1) "C"
      [...]
    

    The snippet also works when using Windows-1252 (ISO-8859-1) encoded strings (of course the mb_* encodings and the locale must be changed then).

    I filed a bug report on bugs.php.net: Bug #46165 strcoll() does not work with UTF-8 strings on Windows. If you experience the same problem, you can give your feedback to the PHP team on the bug-report page (two other, probably related, bugs have been classified as bogus - I don't think that this bug is bogus ;-).

    Thanks to all of you.

提交回复
热议问题