Detect encoding and make everything UTF-8

前端 未结 24 2759
暗喜
暗喜 2020-11-22 03:03

I\'m reading out lots of texts from various RSS feeds and inserting them into my database.

Of course, there are several different character encodings used in the fee

24条回答
  •  一向
    一向 (楼主)
    2020-11-22 03:07

    I was checking for solutions to encoding since ages, and this page is probably the conclusion of years of search! I tested some of the suggestions you mentioned and here's my notes:

    This is my test string:

    this is a "wròng wrìtten" string bùt I nèed to pù 'sòme' special chàrs to see thèm, convertèd by fùnctìon!! & that's it!

    I do an INSERT to save this string on a database in a field that is set as utf8_general_ci

    The character set of my page is UTF-8.

    If I do an INSERT just like that, in my database, I have some characters probably coming from Mars...

    So I need to convert them into some "sane" UTF-8. I tried utf8_encode(), but still aliens chars were invading my database...

    So I tried to use the function forceUTF8 posted on number 8, but in the database the string saved looks like this:

    this is a "wròng wrìtten" string bùt I nèed to pù 'sòme' special chà rs to see thèm, convertèd by fùnctìon!! & that's it!

    So collecting some more information on this page and merging them with other information on other pages I solved my problem with this solution:

    $finallyIDidIt = mb_convert_encoding(
      $string,
      mysql_client_encoding($resourceID),
      mb_detect_encoding($string)
    );
    

    Now in my database I have my string with correct encoding.

    NOTE: Only note to take care of is in function mysql_client_encoding! You need to be connected to the database, because this function wants a resource ID as a parameter.

    But well, I just do that re-encoding before my INSERT so for me it is not a problem.

提交回复
热议问题