In IRB, I\'m trying the following:
1.9.3p194 :001 > foo = \"\\xBF\".encode(\"utf-8\", :invalid => :replace, :undef => :replace)
=> \"\\xBF\"
1.
I'd guess that "\xBF" already thinks it is encoded in UTF-8 so when you call encode, it thinks you're trying to encode a UTF-8 string in UTF-8 and does nothing:
>> s = "\xBF"
=> "\xBF"
>> s.encoding
=> #
\xBF isn't valid UTF-8 so this is, of course, nonsense. But if you use the three argument form of encode:
encode(dst_encoding, src_encoding [, options] ) → str
[...] The second form returns a copy of
strtranscoded fromsrc_encodingtodst_encoding.
You can force the issue by telling encode to ignore what the string thinks its encoding is and treat it as binary data:
>> foo = s.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
=> "�"
Where s is the "\xBF" that thinks it is UTF-8 from above.
You could also use force_encoding on s to force it to be binary and then use the two-argument encode:
>> s.encoding
=> #
>> s.force_encoding('binary')
=> "\xBF"
>> s.encoding
=> #
>> foo = s.encode('utf-8', :invalid => :replace, :undef => :replace)
=> "�"