Is it possible to sort an array with Unicode / UTF-8 characters in PHP using a natural order algorithm? For example (the order in this array is correctly ordered):
I've also another workaround for those setlocale
doesn't work and don't have the intl
module enabled:
// The array to be sorted
$countries = array(
'AT' => Österreich,
'DE' => Deutschland,
'CH' => Schweiz,
);
// Extend this array to your needs.
$utf_sort_map = array(
"ä" => "a",
"Ä" => "A",
"Å" => "A",
"ö" => "o",
"Ö" => "O",
"ü" => "u",
"Ü" => "U",
);
uasort($my_array, function($a, $b) use ($utf_sort_map) {
$initial_a = mb_substr($a, 0, 1);
$initial_b = mb_substr($b, 0, 1);
if (isset($utf_sort_map[$initial_a]) || isset($utf_sort_map[$initial_b])) {
if (isset($utf_sort_map[$initial_a])) {
$initial_a = $utf_sort_map[$initial_a];
}
if (isset($utf_sort_map[$initial_b])) {
$initial_b = $utf_sort_map[$initial_b];
}
if ($initial_a == $initial_b) {
return mb_substr($a, 1) < mb_substr($b, 1) ? -1 : 1;
}
else {
return $initial_a < $initial_b ? -1 : 1;
}
}
return $a < $b ? -1 : 1;
});
The question is not as easy to answer as it seems on the first look. This is one of the areas where PHP's lack of unicode supports hits you with full strength.
Frist of all natsort() as suggested by other posters has nothing to do with sorting arrays of the type you want to sort. What you're looking for is a locale aware sorting mechanism as sorting strings with extended characters is always a question of the used language. Let's take German for example: A and Ä can sometimes be sorted as if they were the same letter (DIN 5007/1), and sometimes Ä can be sorted as it was in fact "AE" (DIN 5007/2). In Swedish, in contrast, Ä comes at the end of the alphabet.
If you don't use Windows, you're lucky as PHP provides some functions to exactly this. Using a combination of setlocale(), usort(), strcoll() and the correct UTF-8 locale for your language, you get something like this:
$array = array('Àgile', 'Ágile', 'Âgile', 'Ãgile', 'Ägile', 'Agile', 'Test');
$oldLocal = setlocale(LC_COLLATE, '<<your_RFC1766_language_code>>.utf8');
usort($array, 'strcoll');
setlocale(LC_COLLATE, $oldLocal);
Please note that it's mandatory to use the UTF-8 locale variant in order to sort UTF-8 strings. I reset the locale in the example above to its original value as setting a locale using setlocale() can introduce side-effects in other running PHP script - please see PHP manual for more details.
When you do use a Windows machine, there is currently no solution to this problem and there won't be any before PHP 6 I assume. Please see my own question on SO targeting this specific problem.
natsort($array);
$array = array_values($array);
I struggled with asort with this issue.
Sorting:
Array
(
[xa] => África
[xo] => Australasia
[cn] => China
[gb] => Reino Unido
[us] => Estados Unidos
[ae] => Emiratos Árabes Unidos
[jp] => Japón
[lk] => Sri Lanka
[xe] => Europa Del Este
[xw] => Europa Del Oeste
[fr] => Francia
[de] => Alemania
[be] => Bélgica
[nl] => Holanda
[es] => España
)
put África at the end. I solved it with this dirty little piece of code (which is fit for my purpose and my timeframe):
$sort = array();
foreach($retval AS $key => $value) {
$v = str_replace('ä', 'a', $value);
$v = str_replace('Ä', 'A', $v);
$v = str_replace('Á', 'A', $v);
$v = str_replace('é', 'e', $v);
$v = str_replace('ö', 'o', $v);
$v = str_replace('ó', 'o', $v);
$v = str_replace('Ö', 'O', $v);
$v = str_replace('ü', 'u', $v);
$v = str_replace('Ü', 'U', $v);
$v = str_replace('ß', 'S', $v);
$v = str_replace('ñ', 'n', $v);
$sort[] = "$v|$key|$value";
}
sort($sort);
$retval = array();
foreach($sort AS $value) {
$arr = explode('|', $value);
$retval[$arr[1]] = $arr[2];
}
Nailed it!
$array = array('Ägile', 'Ãgile', 'Test', 'カタカナ', 'かたかな', 'Ágile', 'Àgile', 'Âgile', 'Agile');
function Sortify($string)
{
return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1' . chr(255) . '$2', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}
array_multisort(array_map('Sortify', $array), $array);
Output:
Array
(
[0] => Agile
[1] => Ágile
[2] => Âgile
[3] => Àgile
[4] => Ãgile
[5] => Ägile
[6] => Test
[7] => かたかな
[8] => カタカナ
)
Even better:
if (extension_loaded('intl') === true)
{
collator_asort(collator_create('root'), $array);
}
Thanks to @tchrist!