Ruby String.encode still gives “invalid byte sequence in UTF-8”

后端 未结 3 1247
说谎
说谎 2020-12-28 10:28

In IRB, I\'m trying the following:

1.9.3p194 :001 > foo = \"\\xBF\".encode(\"utf-8\", :invalid => :replace, :undef => :replace)
 => \"\\xBF\" 
1.         


        
3条回答
  •  轮回少年
    2020-12-28 11:06

    If you're only working with ascii characters you can use

    >> "Hello \xBF World!".encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
    => "Hello � World!"
    

    But what happens if we use the same approach with valid UTF8 characters that are invalid in ascii

    >> "¡Hace \xBF mucho frío!".encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
    => "��Hace � mucho fr��o!"
    

    Uh oh! We want frío to remain with the accent. Here's an option that keeps the valid UTF8 characters

    >> "¡Hace \xBF mucho frío!".chars.select{|i| i.valid_encoding?}.join
    => "¡Hace  mucho frío!"
    

    Also in Ruby 2.1 there is a new method called scrub that solves this problem

    >> "¡Hace \xBF mucho frío!".scrub
    => "¡Hace � mucho frío!"
    >> "¡Hace \xBF mucho frío!".scrub('')
    => "¡Hace  mucho frío!"
    

提交回复
热议问题