How to convert text to unicode in Rails?

亡梦爱人 提交于 2020-01-04 04:17:14

问题


In my database, I have the following entry

id     |      name      |      info
1          John Smith         Çö ¿¬¼

As you can tell, the info column displays wrong -- it's actually Korean, though. In Chrome, when I switch the browser encoding from UTF-8 to Korean ('euc-kr', I think), I actually manage to view the text as such:

id     |      name      |      info
1          John Smith        횉철 쩔짭쩌

I then manually copy the text into the info in the database and save, and now I can view it in UTF-8, without switching my browser's encoding.

Awesome. Now I'd like to get that same thing done in Rails, not manually. So starting with the original entry again, I go to the console and type:

require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('euc-kr','UTF-8', info)
u.update_attribute('info', new_info)

However, what I end up with is something resembling \x{A2AF}\x{A8FA}\x{A1C6} \x{A2A5}\x{A8A2} in the database, not 횉철 쩔짭쩌.

I have a very basic understanding of unicode and encoding.

Can someone please explain what's going on here and how to get around that? The desired result is what I achieved manually.

Thanks!


回答1:


Wow. I'm beating myself over the head now. After hours of trying to resolve this, I finally figured it out myself a few minutes after I posted a question here.

The solution consists of three simple steps:

STEP 1:

I almost had it right. I shouldn't be converting from euc-kr to utf-8, but the other way around, as such:

Iconv.iconv('UTF-8', 'euc-kr', info)

STEP 2:

I might still run into some errors in the text, so to be safe I tell Iconv to ignore any errors:

Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info)

Finally, I actually get REAL KOREAN TEXT, yay! The problem is, when I try to insert it into the database, it's still inserting something along the lines of:

UPDATE `users` SET `info` = '--- \n- \"\\xEC\\xB2\\xA0\\xEC\\xB1\\x8C...' etc...

Even though it turns out I have the right text. So why is that? Onto the last step.

STEP 3:

Turns out the output from Iconv is an array. And so, we merge it with join:

Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info).join

And this actually works!

The final code:

require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('UTF-8//IGNORE','euc-kr', info).join
u.update_attribute('info', new_info)

Hope this helps whomever sees this (and knowing myself, probably future me).




回答2:


why you use Iconv to convert it? first, if you see the correct style on database, you should make sure the database's charset is utf8 on script side, you just save the Korean value, not use Iconv



来源:https://stackoverflow.com/questions/6182380/how-to-convert-text-to-unicode-in-rails

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!