Detecting utf8 broken characters in MySQL

后端 未结 18 2079
广开言路
广开言路 2020-12-02 05:03

I\'ve got a database with a bunch of broken utf8 characters scattered across several tables. The list of characters isn\'t very extensive AFAIK (áéíúóÁÉÍÓÚÑñ)

Fixing

相关标签:
18条回答
  • 2020-12-02 05:23

    There's a nice script to automate the conversion process in a whole database. It's also useful to know that MySQL's UTF-8 implementation is incomplete since it only supports UTF-8 chars up to 3 bytes. The solution is to use the utf8mb4 charset introduced in MySQL 5.5.3.

    0 讨论(0)
  • 2020-12-02 05:25

    I fixed with

    UPDATE wp_zcs9ck_posts_copy SET post_title = 
        CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8);
    

    Complete solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/

    0 讨论(0)
  • 2020-12-02 05:26

    Thanks for your answers!!

    I fixed my tables with this, and wanted to share the full list of changes. Note that it also includes fixing html decoded characters, besides latin ones, it was really a mess:

    (If you need more conversions, look them up at https://www.utf8-chartable.de/unicode-utf8-table.pl)

    update `table` set `field` = replace(`field`, 'â€', '"');
    update `table` set `field` = replace(`field`, '–', '–');
    update `table` set `field` = replace(`field`, '•', '-');
    update `table` set `field` = replace(`field`, '“', '"');
    
    update `table` set `field` = replace(`field`, '¡', '¡');
    update `table` set `field` = replace(`field`, '¢', '¢');
    update `table` set `field` = replace(`field`, '£', '£');
    update `table` set `field` = replace(`field`, '¤', '¤');
    update `table` set `field` = replace(`field`, 'Â¥', '¥');
    update `table` set `field` = replace(`field`, '¦', '¦');
    update `table` set `field` = replace(`field`, '§', '§');
    update `table` set `field` = replace(`field`, '¨', '¨');
    update `table` set `field` = replace(`field`, '©', '©');
    update `table` set `field` = replace(`field`, 'ª', 'ª');
    update `table` set `field` = replace(`field`, '«', '«');
    update `table` set `field` = replace(`field`, '¬', '¬');
    # This one looks like it's missing a character, but it's there. 0xad
    update `table` set `field` = replace(`field`, '­', '­');
    update `table` set `field` = replace(`field`, '®', '®');
    update `table` set `field` = replace(`field`, '¯', '¯');
    update `table` set `field` = replace(`field`, '°', '°');
    update `table` set `field` = replace(`field`, '±', '±');
    update `table` set `field` = replace(`field`, '²', '²');
    update `table` set `field` = replace(`field`, '³', '³');
    update `table` set `field` = replace(`field`, '´', '´');
    update `table` set `field` = replace(`field`, 'µ', 'µ');
    update `table` set `field` = replace(`field`, '¶', '¶');
    update `table` set `field` = replace(`field`, '·', '·');
    update `table` set `field` = replace(`field`, '¸', '¸');
    update `table` set `field` = replace(`field`, '¹', '¹');
    update `table` set `field` = replace(`field`, 'º', 'º');
    update `table` set `field` = replace(`field`, '»', '»');
    update `table` set `field` = replace(`field`, '¼', '¼');
    update `table` set `field` = replace(`field`, '½', '½');
    update `table` set `field` = replace(`field`, '¾', '¾');
    update `table` set `field` = replace(`field`, '¿', '¿');
    
    update `table` set `field` = replace(`field`, 'À', 'À');
    # This one looks like it's missing a character, but it's there. 0x81
    update `table` set `field` = replace(`field`, 'Ã', 'Á');
    update `table` set `field` = replace(`field`, 'Â', 'Â');
    update `table` set `field` = replace(`field`, 'Ã', 'Ã');
    update `table` set `field` = replace(`field`, 'Ä', 'Ä');
    update `table` set `field` = replace(`field`, 'Ã…', 'Å');
    update `table` set `field` = replace(`field`, 'Æ', 'Æ');
    update `table` set `field` = replace(`field`, 'Ç', 'Ç');
    update `table` set `field` = replace(`field`, 'È', 'È');
    update `table` set `field` = replace(`field`, 'É', 'É');
    update `table` set `field` = replace(`field`, 'Ê', 'Ê');
    update `table` set `field` = replace(`field`, 'Ë', 'Ë');
    update `table` set `field` = replace(`field`, 'ÃŒ', 'Ì');
    # This one looks like it's missing a character, but it's there. 0x8d
    update `table` set `field` = replace(`field`, 'Ã', 'Í');
    update `table` set `field` = replace(`field`, 'ÃŽ', 'Î');
    # This one looks like it's missing a character, but it's there. 0x8f
    update `table` set `field` = replace(`field`, 'Ã', 'Ï');
    # This one looks like it's missing a character, but it's there. 0x90
    update `table` set `field` = replace(`field`, 'Ã', 'Ð');
    update `table` set `field` = replace(`field`, 'Ñ', 'Ñ');
    update `table` set `field` = replace(`field`, 'Ã’', 'Ò');
    update `table` set `field` = replace(`field`, 'Ó', 'Ó');
    update `table` set `field` = replace(`field`, 'Ô', 'Ô');
    update `table` set `field` = replace(`field`, 'Õ', 'Õ');
    update `table` set `field` = replace(`field`, 'Ö', 'Ö');
    update `table` set `field` = replace(`field`, '×', '×');
    update `table` set `field` = replace(`field`, 'Ø', 'Ø');
    update `table` set `field` = replace(`field`, 'Ù', 'Ù');
    update `table` set `field` = replace(`field`, 'Ú', 'Ú');
    update `table` set `field` = replace(`field`, 'Û', 'Û');
    update `table` set `field` = replace(`field`, 'Ü', 'Ü');
    # This one looks like it's missing a character, but it's there. 0x9d
    update `table` set `field` = replace(`field`, 'Ã', 'Ý');
    update `table` set `field` = replace(`field`, 'Þ', 'Þ');
    update `table` set `field` = replace(`field`, 'ß', 'ß');
    update `table` set `field` = replace(`field`, 'à', 'à');
    update `table` set `field` = replace(`field`, 'á', 'á');
    update `table` set `field` = replace(`field`, 'â', 'â');
    update `table` set `field` = replace(`field`, 'ã', 'ã');
    update `table` set `field` = replace(`field`, 'ä', 'ä');
    update `table` set `field` = replace(`field`, 'Ã¥', 'å');
    update `table` set `field` = replace(`field`, 'æ', 'æ');
    update `table` set `field` = replace(`field`, 'ç', 'ç');
    update `table` set `field` = replace(`field`, 'è', 'è');
    update `table` set `field` = replace(`field`, 'é', 'é');
    update `table` set `field` = replace(`field`, 'ê', 'ê');
    update `table` set `field` = replace(`field`, 'ë', 'ë');
    update `table` set `field` = replace(`field`, 'ì', 'ì');
    # This one looks like it's missing a character, but it's there. 0xad
    update `table` set `field` = replace(`field`, '­­Ã', 'í');
    update `table` set `field` = replace(`field`, 'î', 'î');
    update `table` set `field` = replace(`field`, 'ï', 'ï');
    update `table` set `field` = replace(`field`, 'ð', 'ð');
    update `table` set `field` = replace(`field`, 'ñ', 'ñ');
    update `table` set `field` = replace(`field`, 'ò', 'ò');
    update `table` set `field` = replace(`field`, 'ó', 'ó');
    update `table` set `field` = replace(`field`, 'ô', 'ô');
    update `table` set `field` = replace(`field`, 'õ', 'õ');
    update `table` set `field` = replace(`field`, 'ö', 'ö');
    update `table` set `field` = replace(`field`, '÷', '÷');
    update `table` set `field` = replace(`field`, 'ø', 'ø');
    update `table` set `field` = replace(`field`, 'ù', 'ù');
    update `table` set `field` = replace(`field`, 'ú', 'ú');
    update `table` set `field` = replace(`field`, 'û', 'û');
    update `table` set `field` = replace(`field`, 'ü', 'ü');
    update `table` set `field` = replace(`field`, 'ý', 'ý');
    update `table` set `field` = replace(`field`, 'þ', 'þ');
    update `table` set `field` = replace(`field`, 'ÿ', 'ÿ');
    
    update `table` set `field` = replace(`field` ,'ç','ç');
    update `table` set `field` = replace(`field` ,'ã','ã');
    update `table` set `field` = replace(`field` ,'á','á');
    update `table` set `field` = replace(`field` ,'â','â');
    update `table` set `field` = replace(`field` ,'é','é');
    update `table` set `field` = replace(`field` ,'í','í');
    update `table` set `field` = replace(`field` ,'õ','õ');
    update `table` set `field` = replace(`field` ,'ú','ú');
    update `table` set `field` = replace(`field` ,'ç','ç');
    update `table` set `field` = replace(`field` ,'Á','Á');
    update `table` set `field` = replace(`field` ,'Â','Â');
    update `table` set `field` = replace(`field` ,'É','É');
    update `table` set `field` = replace(`field` ,'Í','Í');
    update `table` set `field` = replace(`field` ,'Õ','Õ');
    update `table` set `field` = replace(`field` ,'Ú','Ú');
    update `table` set `field` = replace(`field` ,'Ç','Ç');
    update `table` set `field` = replace(`field` ,'Ã','Ã');
    update `table` set `field` = replace(`field` ,'À','À');
    update `table` set `field` = replace(`field` ,'Ê','Ê');
    update `table` set `field` = replace(`field` ,'Ó','Ó');
    update `table` set `field` = replace(`field` ,'Ô','Ô');
    update `table` set `field` = replace(`field` ,'Ü','Ü');
    update `table` set `field` = replace(`field` ,'ã','ã');
    update `table` set `field` = replace(`field` ,'à','à');
    update `table` set `field` = replace(`field` ,'ê','ê');
    update `table` set `field` = replace(`field` ,'ó','ó');
    update `table` set `field` = replace(`field` ,'ô','ô');
    update `table` set `field` = replace(`field` ,'ü','ü');
    update `table` set `field` = replace(`field` ,'&','&');
    update `table` set `field` = replace(`field` ,'>','>');
    update `table` set `field` = replace(`field` ,'&lt;','<');
    update `table` set `field` = replace(`field` ,'&circ;','ˆ');
    update `table` set `field` = replace(`field` ,'&tilde;','˜');
    update `table` set `field` = replace(`field` ,'&uml;','¨');
    update `table` set `field` = replace(`field` ,'&cute;','´');
    update `table` set `field` = replace(`field` ,'&cedil;','¸');
    update `table` set `field` = replace(`field` ,'&quot;','"');
    update `table` set `field` = replace(`field` ,'&ldquo;','“');
    update `table` set `field` = replace(`field` ,'&rdquo;','”');
    update `table` set `field` = replace(`field` ,'&lsquo;','‘');
    update `table` set `field` = replace(`field` ,'&rsquo;','’');
    update `table` set `field` = replace(`field` ,'&lsaquo;','‹');
    update `table` set `field` = replace(`field` ,'&rsaquo;','›');
    update `table` set `field` = replace(`field` ,'&laquo;','«');
    update `table` set `field` = replace(`field` ,'&raquo;','»');
    update `table` set `field` = replace(`field` ,'&ordm;','º');
    update `table` set `field` = replace(`field` ,'&ordf;','ª');
    update `table` set `field` = replace(`field` ,'&ndash;','–');
    update `table` set `field` = replace(`field` ,'&mdash;','—');
    update `table` set `field` = replace(`field` ,'&macr;','¯');
    update `table` set `field` = replace(`field` ,'&hellip;','…');
    update `table` set `field` = replace(`field` ,'&brvbar;','¦');
    update `table` set `field` = replace(`field` ,'&bull;','•');
    update `table` set `field` = replace(`field` ,'&para;','¶');
    update `table` set `field` = replace(`field` ,'&sect;','§');
    update `table` set `field` = replace(`field` ,'&sup1;','¹');
    update `table` set `field` = replace(`field` ,'&sup2;','²');
    update `table` set `field` = replace(`field` ,'&sup3;','³');
    update `table` set `field` = replace(`field` ,'&frac12;','½');
    update `table` set `field` = replace(`field` ,'&frac14;','¼');
    update `table` set `field` = replace(`field` ,'&frac34;','¾');
    update `table` set `field` = replace(`field` ,'&#8539;','⅛');
    update `table` set `field` = replace(`field` ,'&#8540;','⅜');
    update `table` set `field` = replace(`field` ,'&#8541;','⅝');
    update `table` set `field` = replace(`field` ,'&#8542;','⅞');
    update `table` set `field` = replace(`field` ,'&gt;','>');
    update `table` set `field` = replace(`field` ,'&lt;','<');
    update `table` set `field` = replace(`field` ,'&plusmn;','±');
    update `table` set `field` = replace(`field` ,'&minus;','−');
    update `table` set `field` = replace(`field` ,'&times;','×');
    update `table` set `field` = replace(`field` ,'&divide;','÷');
    update `table` set `field` = replace(`field` ,'&lowast;','∗');
    update `table` set `field` = replace(`field` ,'&frasl;','⁄');
    update `table` set `field` = replace(`field` ,'&permil;','‰');
    update `table` set `field` = replace(`field` ,'&int;','∫');
    update `table` set `field` = replace(`field` ,'&sum;','∑');
    update `table` set `field` = replace(`field` ,'&prod;','∏');
    update `table` set `field` = replace(`field` ,'&radic;','√');
    update `table` set `field` = replace(`field` ,'&infin;','∞');
    update `table` set `field` = replace(`field` ,'&asymp;','≈');
    update `table` set `field` = replace(`field` ,'&cong;','≅');
    update `table` set `field` = replace(`field` ,'&prop;','∝');
    update `table` set `field` = replace(`field` ,'&equiv;','≡');
    update `table` set `field` = replace(`field` ,'&ne;','≠');
    update `table` set `field` = replace(`field` ,'&le;','≤');
    update `table` set `field` = replace(`field` ,'&ge;','≥');
    update `table` set `field` = replace(`field` ,'&there4;','∴');
    update `table` set `field` = replace(`field` ,'&sdot;','⋅');
    update `table` set `field` = replace(`field` ,'&middot;','·');
    update `table` set `field` = replace(`field` ,'&part;','∂');
    update `table` set `field` = replace(`field` ,'&image;','ℑ');
    update `table` set `field` = replace(`field` ,'&real;','ℜ');
    update `table` set `field` = replace(`field` ,'&prime;','′');
    update `table` set `field` = replace(`field` ,'&Prime;','″');
    update `table` set `field` = replace(`field` ,'&deg;','°');
    update `table` set `field` = replace(`field` ,'&ang;','∠');
    update `table` set `field` = replace(`field` ,'&perp;','⊥');
    update `table` set `field` = replace(`field` ,'&nabla;','∇');
    update `table` set `field` = replace(`field` ,'&oplus;','⊕');
    update `table` set `field` = replace(`field` ,'&otimes;','⊗');
    update `table` set `field` = replace(`field` ,'&alefsym;','ℵ');
    update `table` set `field` = replace(`field` ,'&oslash;','ø');
    update `table` set `field` = replace(`field` ,'&Oslash;','Ø');
    update `table` set `field` = replace(`field` ,'&isin;','∈');
    update `table` set `field` = replace(`field` ,'&notin;','∉');
    update `table` set `field` = replace(`field` ,'&cap;','∩');
    update `table` set `field` = replace(`field` ,'&cup;','∪');
    update `table` set `field` = replace(`field` ,'&sub;','⊂');
    update `table` set `field` = replace(`field` ,'&sup;','⊃');
    update `table` set `field` = replace(`field` ,'&sube;','⊆');
    update `table` set `field` = replace(`field` ,'&supe;','⊇');
    update `table` set `field` = replace(`field` ,'&exist;','∃');
    update `table` set `field` = replace(`field` ,'&forall;','∀');
    update `table` set `field` = replace(`field` ,'&empty;','∅');
    update `table` set `field` = replace(`field` ,'&not;','¬');
    update `table` set `field` = replace(`field` ,'&and;','∧');
    update `table` set `field` = replace(`field` ,'&or;','∨');
    update `table` set `field` = replace(`field` ,'&crarr;','↵');
    
    0 讨论(0)
  • 2020-12-02 05:26

    In addition to Raúl Avila Solano and acseven's answer if you want to update all the broken characters in one query you can do:

    update `table` set field = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(field,'&uuml;','ü'),'&ocirc;','ô'),'&oacute;','ó'),'&ecirc;','ê'),'&agrave;','à'),'&atilde;','ã'),'&Uuml;','Ü'),'&Ocirc;','Ô'),'&Oacute;','Ó'),'&Ecirc;','Ê'),'&Agrave;','À'),'&Atilde;','Ã'),'&Ccedil;','Ç'),'&Uacute;','Ú'),'&Otilde;','Õ'),'&Iacute;','Í'),'&Iacute;','Í'),'&Eacute;','É'),'&Acirc;','Â'),'&Aacute;','Á'),'&ccedil;','ç'),'&uacute;','ú'),'&otilde;','õ'),'&iacute;','í'),'&eacute;','é'),'&acirc;','â'),'&aacute;','á'),'&atilde;','ã'),'&ccedil;','ç'),'à ','à'),'à ','à'),'º','º'),'ª','ª'),'ç','ç'),'–','–'),'ó','ó'),'é','é'),'á','á'),'ê','ê'),'ã','ã'),'â','â'),'í','í'),'õ','õ'),'Ø','Ø'),'•','-'),'ú','ú'),'à ','À'),'Ã','Ã'),'Ç','Ç'),'â€','"'),'“','"'),'É','É');
    
    0 讨论(0)
  • 2020-12-02 05:26

    To convert all latin characters to correct accents try this on MySQL :

    UPDATE your_table SET your_column = CONVERT(CAST(CONVERT(your_column USING latin1) AS BINARY) USING utf8)
    
    0 讨论(0)
  • 2020-12-02 05:29
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
    UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
    
    0 讨论(0)
提交回复
热议问题