Ruby read CSV file as UTF-8 and/or convert ASCII-8Bit encoding to UTF-8

后端 未结 3 2180
醉酒成梦
醉酒成梦 2020-12-04 15:39

I\'m using ruby 1.9.2

I\'m trying to parse a CSV file that contains some French words (e.g. spécifié) and place the contents in a MySQL database.

3条回答
  •  无人及你
    2020-12-04 15:59

    deceze is right, that is ISO8859-1 (AKA Latin-1) encoded text. Try this:

    file_contents = CSV.read("csvfile.csv", col_sep: "$", encoding: "ISO8859-1")
    

    And if that doesn't work, you can use Iconv to fix up the individual strings with something like this:

    require 'iconv'
    utf8_string = Iconv.iconv('utf-8', 'iso8859-1', latin1_string).first
    

    If latin1_string is "Non sp\xE9cifi\xE9", then utf8_string will be "Non spécifié". Also, Iconv.iconv can unmangle whole arrays at a time:

    utf8_strings = Iconv.iconv('utf-8', 'iso8859-1', *latin1_strings)
    

    With newer Rubies, you can do things like this:

    utf8_string = latin1_string.force_encoding('iso-8859-1').encode('utf-8')
    

    where latin1_string thinks it is in ASCII-8BIT but is really in ISO-8859-1.

提交回复
热议问题