How can I convert a string from windows-1252 to utf-8 in Ruby?

前端 未结 5 2001
日久生厌
日久生厌 2020-12-03 12:04

I\'m migrating some data from MS Access 2003 to MySQL 5.0 using Ruby 1.8.6 on Windows XP (writing a Rake task to do this).

Turns out the Windows string data is encod

5条回答
  •  误落风尘
    2020-12-03 12:53

    Hy,

    I had the exact same problem.

    These tips helped me get goin:

    Always check for the proper encoding name in order to feed your conversion tools correctly. In doubt you can get a list of supported encodings for iconv or recode using:

    $ recode -l
    

    or

    $ iconv -l
    

    Always start from you original file and encode a sample to work with:

    $ recode windows-1252..u8 < original.txt > sample_utf8.txt
    

    or

    $ iconv -f windows-1252 -t utf8 original.txt -o sample_utf8.txt
    

    Install Ruby1.9, because it helps you A LOT when it comes to encodings. Even if you don't use it in your programm, you can always start an irb1.9 session and pick on the strings to see what the output is. File.open has a new 'mode' parameter in Ruby 1.9. Use it! This article helped a lot: http://blog.nuclearsquid.com/writings/ruby-1-9-encodings

    File.open('original.txt', 'r:windows-1252:utf-8')
    # This opens a file specifying all encoding options. r:windows-1252 means read it as windows-1252. :utf-8 means treat it as utf-8 internally.
    

    Have fun and swear a lot!

提交回复
热议问题