I\'m reading data from a remote source, and occassionally get some characters in another encoding. They\'re not important.
I\'d like to get get a \"best guess\" utf-8 st
String#chars or String#each_char can be also used.
# Table 3-8. Use of U+FFFD in UTF-8 Conversion
# http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf)
str = "\x61"+"\xF1\x80\x80"+"\xE1\x80"+"\xC2"
+"\x62"+"\x80"+"\x63"+"\x80"+"\xBF"+"\x64"
p [
'abcd' == str.chars.collect { |c| (c.valid_encoding?) ? c : '' }.join,
'abcd' == str.each_char.map { |c| (c.valid_encoding?) ? c : '' }.join
]
String#scrub can be used since Ruby 2.1.
p [
'abcd' == str.scrub(''),
'abcd' == str.scrub{ |c| '' }
]