Equivalent of Iconv.conv(“UTF-8//IGNORE”,…) in Ruby 1.9.X?

后端未结

关注

 6  1715

情歌与酒 2021-02-04 09:47

I\'m reading data from a remote source, and occassionally get some characters in another encoding. They\'re not important.

I\'d like to get get a \"best guess\" utf-8 st

6条回答

我寻月下人不归 (楼主)

2021-02-04 10:34

String#chars or String#each_char can be also used.

# Table 3-8. Use of U+FFFD in UTF-8 Conversion
# http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf)
str = "\x61"+"\xF1\x80\x80"+"\xE1\x80"+"\xC2"
     +"\x62"+"\x80"+"\x63"+"\x80"+"\xBF"+"\x64"

p [
  'abcd' == str.chars.collect { |c| (c.valid_encoding?) ? c : '' }.join,
  'abcd' == str.each_char.map { |c| (c.valid_encoding?) ? c : '' }.join
]

String#scrub can be used since Ruby 2.1.

p [
  'abcd' == str.scrub(''),
  'abcd' == str.scrub{ |c| '' }
]

0 讨论(0)

查看其它6个回答