Remove non-utf8 characters from string

后端 未结 18 1736
心在旅途
心在旅途 2020-11-22 11:56

Im having a problem with removing non-utf8 characters from string, which are not displaying properly. Characters are like this 0x97 0x61 0x6C 0x6F (hex representation)

18条回答
  •  悲&欢浪女
    2020-11-22 12:38

    If you apply utf8_encode() to an already UTF8 string it will return a garbled UTF8 output.

    I made a function that addresses all this issues. It´s called Encoding::toUTF8().

    You dont need to know what the encoding of your strings is. It can be Latin1 (ISO8859-1), Windows-1252 or UTF8, or the string can have a mix of them. Encoding::toUTF8() will convert everything to UTF8.

    I did it because a service was giving me a feed of data all messed up, mixing those encodings in the same string.

    Usage:

    require_once('Encoding.php'); 
    use \ForceUTF8\Encoding;  // It's namespaced now.
    
    $utf8_string = Encoding::toUTF8($mixed_string);
    
    $latin1_string = Encoding::toLatin1($mixed_string);
    

    I've included another function, Encoding::fixUTF8(), which will fix every UTF8 string that looks garbled product of having been encoded into UTF8 multiple times.

    Usage:

    require_once('Encoding.php'); 
    use \ForceUTF8\Encoding;  // It's namespaced now.
    
    $utf8_string = Encoding::fixUTF8($garbled_utf8_string);
    

    Examples:

    echo Encoding::fixUTF8("Fédération Camerounaise de Football");
    echo Encoding::fixUTF8("Fédération Camerounaise de Football");
    echo Encoding::fixUTF8("FÃÂédÃÂération Camerounaise de Football");
    echo Encoding::fixUTF8("Fédération Camerounaise de Football");
    

    will output:

    Fédération Camerounaise de Football
    Fédération Camerounaise de Football
    Fédération Camerounaise de Football
    Fédération Camerounaise de Football
    

    Download:

    https://github.com/neitanod/forceutf8

提交回复
热议问题