In IRB, I\'m trying the following:
1.9.3p194 :001 > foo = \"\\xBF\".encode(\"utf-8\", :invalid => :replace, :undef => :replace)
=> \"\\xBF\"
1.
If you're only working with ascii characters you can use
>> "Hello \xBF World!".encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
=> "Hello � World!"
But what happens if we use the same approach with valid UTF8 characters that are invalid in ascii
>> "¡Hace \xBF mucho frío!".encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
=> "��Hace � mucho fr��o!"
Uh oh! We want frío to remain with the accent. Here's an option that keeps the valid UTF8 characters
>> "¡Hace \xBF mucho frío!".chars.select{|i| i.valid_encoding?}.join
=> "¡Hace mucho frío!"
Also in Ruby 2.1 there is a new method called scrub
that solves this problem
>> "¡Hace \xBF mucho frío!".scrub
=> "¡Hace � mucho frío!"
>> "¡Hace \xBF mucho frío!".scrub('')
=> "¡Hace mucho frío!"