Is it possible to sort an array of values using a specific collation in Ruby? I have a need to sort according to the da_DK collation.
Given the array %w(Aarhu
According to Wikipedia:
In the Danish and Norwegian alphabets, the same extra vowels as in Swedish (see below) are also present but in a different order and with different glyphs (..., X, Y, Z, Æ, Ø, Å). Also, "Aa" collates as an equivalent to "Å". The Danish alphabet has traditionally seen "W" as a variant of "V", but today "W" is considered a separate letter."
This would throw off sorting.
Do this to fix the problem:
names = %w(Aarhus Aalborg Assens)
names.sort_by { |w| w.gsub('Aa', 'Å') } # => ["Assens", "Aalborg", "Aarhus"]
and something similar for the other letters that have compound character combinations to convert to the single character.
The reason this works is sort_by does a Schwartzian Transformation, so it's actually sorting by the return value returned from the block, which, in this case, is the name with 'Aa' replaced with 'Å'. The replacement is temporary, and discarded when the array is sorted.
sort_by is very powerful, but does have some overhead. For a simple sort you should use sort because its faster. For sorts where you're comparing two simple values at the top level of an object then it becomes a wash whether you should use sort or sort_by. If you have to do more complex calculations or dig around in an object then sort_by can prove to be faster. There isn't a real hard-and-fast way to know which is better, so I strongly recommend testing with a benchmark if you have to sort large arrays or deal with objects because the difference can be large, and sometimes sort can be the better choice.
EDIT:
Ruby, by itself, isn't going to do what you want, because it has no knowledge of the sort order of every character set out there. There's a discussion regarding incorporating IBM's ICU that explains why that is. If you want ICU's abilities, you could look into ICU4R. I haven't played with it, but it sounds like your only real solution in Ruby.
You might be able to do something with a database like Postgres. They support various collating options but usually force you to declare the collation when you create the database... or maybe it's when the table is created... it's been a while since I created a new table. Anyway, that'd be an option, though it would be a pain.