problem with fgetcsv( ) and Unicode

独自空忆成欢 提交于 2019-12-06 12:45:17
timdream

Note:

Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

http://php.net/fgetcsv

One possible solution is to use setlocale().

One such thing is the occurrence of the UTF byte order mark, or BOM. The UTF-8 character for the byte order mark is U+FEFF, or rather three bytes – 0xef, 0xbb and 0xbf – that sits in the beginning of the text file. For UTF-16 it is used to indicate the byte order. For UTF-8 it is not really necessary.

So you need to detect the three bytes and remove the BOM. Below is a simplified example on how to detect and remove the three bytes.

$str = file_get_contents('file.utf8.csv');
$bom = pack("CCC", 0xef, 0xbb, 0xbf);
if (0 == strncmp($str, $bom, 3)) {
    echo "BOM detected - file is UTF-8\n";
    $str = substr($str, 3);
}

That's all

I used iconv for unicode encoding, and it works almost perfect in my situation. I hope it will help someone else too.

$csvFile = fopen('file/path', "r");
fgetcsv($csvFile);
while(($row = fgetcsv($csvFile, 1000, ";")) !== FALSE){        
  for ($c=0; $c < count($row); $c++) {
    echo iconv( "Windows-1252", "UTF-8", $row[$c]);
  }
}
fclose($csvFile);
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!