Equivalent of Iconv.conv(“UTF-8//IGNORE”,…) in Ruby 1.9.X?

后端 未结 6 1717
情歌与酒
情歌与酒 2021-02-04 09:47

I\'m reading data from a remote source, and occassionally get some characters in another encoding. They\'re not important.

I\'d like to get get a \"best guess\" utf-8 st

6条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-04 10:30

    To ignore all unknown parts of the string that aren't correctly UTF-8 encoded the following (as you originally posted) almost does what you want.

    string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "")
    

    The caveat is that encode doesn't do anything if it thinks the string is already UTF-8. So you need to change encodings, going via an encoding that can still encode the full set of unicode characters that UTF-8 can encode. (If you don't you'll corrupt any characters that aren't in that encoding - 7bit ASCII would be a really bad choice!) So go via UTF-16:

    string.encode('UTF-16', :invalid => :replace, :replace => '').encode('UTF-8')
    

提交回复
热议问题