Ruby to_json issue with error “illegal/malformed utf-8”

无人久伴 提交于 2019-11-29 03:44:45

\xAE is not a valid character in UTF-8, you have to use \u00AE instead:

"iPhone\u00AE"
#=> "iPhone®"

Or convert it accordingly:

"iPhone\xAE".force_encoding("ISO-8859-1").encode("UTF-8")
#=> "iPhone®"

Every string in Ruby has a underlaying encoding. Depending on your LANG and LC_ALL environment variables, the interactive shell might be executing and interpreting your strings in a given encoding.

$ irb
1.9.3p392 :008 > __ENCODING__
 => #<Encoding:UTF-8>

(ignore that I’m using Ruby 1.9 instead of 2.0, the ideas are still the same).

__ENCODING__ returns the current source encoding. Yours will probably also say UTF-8.

When you create literal strings and use byte escapes (the \xAE) in your code, Ruby is trying to interpret that according to the string encoding:

1.9.3p392 :003 > a = {"description" => "iPhone\xAE"}
 => {"description"=>"iPhone\xAE"}
1.9.3p392 :004 > a["description"].encoding
 => #<Encoding:UTF-8>

So, the byte \xAE at the end of your literal string will be tried to be treated as a UTF-8 stream byte, but it is invalid. See what happens when I try to print it:

1.9.3-p392 :001 > puts "iPhone\xAE"
iPhone�
 => nil

You either need to provide the registered mark character in a valid UTF-8 encoding (either using the real character, or providing the two UTF-8 bytes):

1.9.3-p392 :002 > a = {"description1" => "iPhone®", "description2" => "iPhone\xc2\xae"}
 => {"description1"=>"iPhone®", "description2"=>"iPhone®"}
1.9.3-p392 :005 > a.to_json
 => "{\"description1\":\"iPhone®\",\"description2\":\"iPhone®\"}"

Or, if your input is ISO-8859-1 (Latin 1) and you know it for sure, you can tell Ruby to interpret your string as another encoding:

1.9.3-p392 :006 > a = {"description1" => "iPhone\xAE".force_encoding('ISO-8859-1') }
 => {"description1"=>"iPhone\xAE"}
1.9.3-p392 :007 > a.to_json
 => "{\"description1\":\"iPhone®\"}"

Hope it helps.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!