PHP character encoding hell reading csv file with fgets

后端 未结 2 1980
感情败类
感情败类 2020-12-19 17:33

I have a web site that receives a CSV file by FTP once a month. For years it was an ASCII file. Now I\'m receiving UTF-8 one month then UTF-16BE the next and UTF-16LE the m

相关标签:
2条回答
  • 2020-12-19 18:19

    Explicitly pass the order and possible encodings to detect, and use strict parameter. Also please use file_get_contents, if the file is in UTF-16LE, fgets will screw it up for you.

    <?php
    header( "Content-Type: text/html; charset=utf-8");
    $input = file_get_contents( $file_in );
    
    $encoding = mb_detect_encoding( $input, array(
        "UTF-8",
        "UTF-32",
        "UTF-32BE",
        "UTF-32LE",
        "UTF-16",
        "UTF-16BE",
        "UTF-16LE"
    ), TRUE );
    
    if( $encoding !== "UTF-8" ) {
        $input = mb_convert_encoding( $input, "UTF-8", $encoding );
    }
    echo "<p>$encoding</p>";
    
    foreach( explode( PHP_EOL, $input ) as $line ) {
        var_dump( $line );
    }
    

    The order is important because UTF-8 and UTF-32 are more restrictive and UTF-16 is extremely permissive; pretty much any random even length of bytes are valid UTF-16.

    The only way you will retain all information, is to convert it to an unicode encoding, not ASCII.

    0 讨论(0)
  • 2020-12-19 18:35

    My suggestion would be to just convert everything to UTF-8 or ASCII (not quite sure from the code you posted if you're trying to convert everything to UTF-8 or ASCII)

    $utf8Line = iconv( mb_detect_encoding( $line ), 'UTF-8', $line );
    

    or...

    $asciiLine = iconv( mb_detect_encoding( $line ), 'ASCII', $line );
    

    You can leverage mb_detect_encoding to do the heavy lifting for you

    0 讨论(0)
提交回复
热议问题