In my app (Rails 3.0.5, Ruby 1.8.7), I created an import tool to import CSV data from file.
Problem: I asked my users to export the CSV file from Excel in UTF-8 enco
For 1.9 it's obvious, you just tell it to expect utf8 and it will raise an error if it isn't:
begin
lines = CSV.read('bad.csv', :encoding => 'utf-8')
rescue ArgumentError
puts "My users don't listen to me!"
end
You can use Charlock Holmes, a character encoding detecting library for Ruby.
https://github.com/brianmario/charlock_holmes
To use it, you just read the file, and use the detect
method.
contents = File.read('test.xml')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}
You can also convert the encoding to UTF-8 if it is not in the correct format:
utf8_encoded_content = CharlockHolmes::Converter.convert contents, detection[:encoding], 'UTF-8'
This saves users from having to do it themselves before uploading it again.