adding backslash to fix character encoding in ruby string

蹲街弑〆低调 提交于 2019-12-06 13:20:39

Warning, the following is not really pretty.

str = "u00a362 000? you must be joking"
split_unicode = str.gsub(/(u00[a-z0-9]{2})/, "split_here\\1split_here").split(/split_here/)
final = split_unicode.map do |elem|
  if elem =~ /^u00/
    [("0x" + elem.gsub(/u00/, '')).hex].pack("U*")
  else
    elem
  end
end
puts final.join

So the idea here is to find u00xx values and convert them to hex. From there, we can use the pack method to output the right unicode characters.

It can also be crunched in an horrible one-liner!

puts (str.gsub(/(u00[a-z0-9]{2})/, "split_here\\1split_here").split(/split_here/).map {|elem| elem =~ /^u00/ ? [("0x" + elem.gsub(/u00/, '')).hex].pack("U*") : elem}).join

There might be a better solution (I hope!) but this one works.

Try the Iconv library for converting the incoming string. You might also take a look at the stringex gem. It has methods to "go the other way" but it may provide the mappings you're looking for. That said if you've got bad encoding it can be impossible to get it right.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!