I am trying to figure out a \'proper\' way of sorting UTF-8 strings in Ruby on Rails.
In my application, I have a select box that is populated with countries. As my
Ruby peforms string comparisons based on byte values of characters:
%w[à a e].sort
# => ["a", "e", "à"]
To properly collate strings according to locale, the ffi-icu gem could be used:
require "ffi-icu"
ICU::Collation.collate("it_IT", %w[à a e])
# => ["a", "à", "e"]
ICU::Collation.collate("de", %w[a s x ß])
# => ["a", "s", "ß", "x"]
As an alternative:
collator = ICU::Collation::Collator.new("it_IT")
%w[à a e].sort { |a, b| collator.compare(a, b) }
# => %w[a à e]
Update To test how strings should collate according to locale rules the ICU project provides this nice tool.