Equivalent of Iconv.conv(“UTF-8//IGNORE”,…) in Ruby 1.9.X?

后端 未结 6 1715
情歌与酒
情歌与酒 2021-02-04 09:47

I\'m reading data from a remote source, and occassionally get some characters in another encoding. They\'re not important.

I\'d like to get get a \"best guess\" utf-8 st

6条回答
  •  我寻月下人不归
    2021-02-04 10:34

    String#chars or String#each_char can be also used.

    # Table 3-8. Use of U+FFFD in UTF-8 Conversion
    # http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf)
    str = "\x61"+"\xF1\x80\x80"+"\xE1\x80"+"\xC2"
         +"\x62"+"\x80"+"\x63"+"\x80"+"\xBF"+"\x64"
    
    p [
      'abcd' == str.chars.collect { |c| (c.valid_encoding?) ? c : '' }.join,
      'abcd' == str.each_char.map { |c| (c.valid_encoding?) ? c : '' }.join
    ]
    

    String#scrub can be used since Ruby 2.1.

    p [
      'abcd' == str.scrub(''),
      'abcd' == str.scrub{ |c| '' }
    ]
    

提交回复
热议问题