MySQL Convert latin1 data to UTF8

天大地大妈咪最大 提交于 2019-11-27 19:55:19
luison

I've had cases like this in old wordpress installations with the problem being that the data itself was already in UTF-8 within a Latin1 database (due to WP default charset). This means there was no real need for conversion of the data but the ddbb and table formats. In my experience things get messed up when doing the dump as I understand MySQL will use the client's default character set which in many cases is now UTF-8. Therefore making sure that exporting with the same coding of the data is very important. In case of Latin1 DDBB with UTF-8 coding:

$ mysqldump –default-character-set=latin1 –databases wordpress > m.sql

Then replace the Latin1 references within the exported dump before reimporting to a new database in UTF-8. Sort of:

$ replace "CHARSET=latin1" "CHARSET=utf8" \
    "SET NAMES latin1" "SET NAMES utf8" < m.sql > m2.sql

In my case this link was of great help. Commented here in spanish.

Though it is hardly still actual for the OP, I happen to have found a solution in MySQL documentation for ALTER TABLE. I post it here just for future reference:

Warning

The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

The reason this works is that there is no conversion when you convert to or from BLOB columns.

LOAD DATA INFILE allows you to set an encoding file is supposed to be in:

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

I wrote that http://code.google.com/p/mysqlutf8convertor/ for Latin Database to UTF-8 Database. All tables and field to change UTF-8.

Converting latin1 to UTF8 is not what you want to do, you kind of need the opposite.

If what really happened was this:

  1. UTF-8 strings were interpreted as Latin-1 and transcoded to UTF-8, mangling them.
  2. You are now, or could be, reading UTF-8 strings with no further interpretation

What you must do now is:

  1. Read the "UTF-8" with no transcode.
  2. Convert it to Latin-1. Now you should actually have the original UTF-8.
  3. Now put it in your "UTF-8" column with no further conversion.

I recently completed a shell script that automates the conversion process. It is also configurable to write custom filters for any text you wish to replace or remove. For example : stripping HTML characters etc. Table whitelists and blacklists are also possible. You can download it at sourceforge: https://sourceforge.net/projects/mysqltr/

Try this:

1) Dump your DB

mysqldump --default-character-set=latin1 -u username -p databasename < dump.sql

2) Open dump.sql in text editor and replace all occurences of "SET NAMES latin1" by "SET NAMES utf8"

3) Create a new database and restore your dumpfile

cat dump.sql | mysql -u root -p newdbname
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!