how to convert character encoding with ruby 1.9

限于喜欢 提交于 2019-11-30 08:56:51

As the exception points, your string is ASCII-8BIT encoded. You should change the encoding. There is a long story about that, but if you are interested in quick solution, just force_encoding on the string before you do any processing:

s = "Learn Objective\xE2\x80\x93C on the Mac"
# => "Learn Objective\xE2\x80\x93C on the Mac"
s.encoding
# => #<Encoding:ASCII-8BIT>
s.force_encoding 'utf-8'
# => "Learn Objective–C on the Mac"

Mladen's solution works if everything that is encoded in ASCII-8BIT can actually be converted directly to UTF-8. It breaks when there are characters that are 1) invalid, or 2) undefined in UTF-8. However, this will work (in 1.9.2 and up:

new_str = s.encode('utf-8', 'binary', :invalid => :replace, 
  :undef => :replace, :replace => '')

ASCII-8BIT is effectively binary. This code converts the encoding to UTF-8, while properly dealing with invalid and undefined characters. The :invalid option specifies that invalid characters be replaced. The :undef option specifies that undefined characters be replaced. And the :replace option defines what the invalid or undefined characters should be replaced with. In this case, I opted to simply remove them.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!