PHP character encoding hell reading csv file with fgets

后端未结

关注

 2  1985

I have a web site that receives a CSV file by FTP once a month. For years it was an ASCII file. Now I\'m receiving UTF-8 one month then UTF-16BE the next and UTF-16LE the m

相关标签:

2条回答

旧巷少年郎

2020-12-19 18:19
Explicitly pass the order and possible encodings to detect, and use strict parameter. Also please use file_get_contents, if the file is in UTF-16LE, fgets will screw it up for you.
```
<?php
header( "Content-Type: text/html; charset=utf-8");
$input = file_get_contents( $file_in );

$encoding = mb_detect_encoding( $input, array(
    "UTF-8",
    "UTF-32",
    "UTF-32BE",
    "UTF-32LE",
    "UTF-16",
    "UTF-16BE",
    "UTF-16LE"
), TRUE );

if( $encoding !== "UTF-8" ) {
    $input = mb_convert_encoding( $input, "UTF-8", $encoding );
}
echo "<p>$encoding</p>";

foreach( explode( PHP_EOL, $input ) as $line ) {
    var_dump( $line );
}
```
The order is important because UTF-8 and UTF-32 are more restrictive and UTF-16 is extremely permissive; pretty much any random even length of bytes are valid UTF-16.

The only way you will retain all information, is to convert it to an unicode encoding, not ASCII.
0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-19 18:35
My suggestion would be to just convert everything to UTF-8 or ASCII (not quite sure from the code you posted if you're trying to convert everything to UTF-8 or ASCII)
```
$utf8Line = iconv( mb_detect_encoding( $line ), 'UTF-8', $line );
```
or...
```
$asciiLine = iconv( mb_detect_encoding( $line ), 'ASCII', $line );
```
You can leverage mb_detect_encoding to do the heavy lifting for you
0 讨论(0)
发布评论:

提交评论
- 加载中...