fwrite() and UTF8

十年热恋 提交于 2019-11-27 20:51:15
Florin Sima

If you know the data is in UTF8 than you want to set up the header.

I wrote a solution answering to another tread.

The solution is the following: As the UTF-8 byte-order mark is \xef\xbb\xbf we should add it to the document's header.

<?php
function writeStringToFile($file, $string){
    $f=fopen($file, "wb");
    $file="\xEF\xBB\xBF".$file; // this is what makes the magic
    fputs($f, $string);
    fclose($f);
}
?>

You can adapt it to your code, basically you just want to make sure that you write a UTF8 file (as you said you know your content is UTF8 encoded).

hakre

fwrite() is not binary safe. That means, that your data - be it correctly encoded or not - might get mangled by this command or it's underlying routines.

To be on the safe side, you should use fopen() with the binary mode flag. that's b. Afterwards, fwrite() will safe your string data "as-is", and that is in PHP until now binary data, because strings in PHP are binary strings.

Background: Some systems differ between text and binary data. The binary flag will explicitly command PHP on such systems to use the binary output. When you deal with UTF-8 you should take care that the data does not get's mangeled. That's prevented by handling the string data as binary data.

However: If it's not like you told in your question that the UTF-8 encoding of the data is preserved, than your encoding got broken and even binary safe handling will keep the broken status. However, with the binary flag you still ensure that this is not the fwrite() part of your application that is breaking things.

It has been rightfully written in another answer here, that you do not know the encoding if you have data only. However, you can validate data if it validates UTF-8 encoding or not, so giving you at least some chance to check the encoding. A function in PHP which does this I've posted in a UTF-8 releated question so it might be of use for you if you need to debug things: Answer to: SimpleXML and Chinese look for can_be_valid_utf8_statemachine, that's the name of the function.

The problem is your data is double encoded. I assume your original text is something like:

Don’t do anything

with , i.e., not the straight apostrophe, but the right single quotation mark.

If you write a PHP script with this content and encoded in UTF-8:

<?php
//File in UTF-8
echo utf8_encode("Don’t"); //this will double encode

You will get something similar to your output.

//add BOM to fix UTF-8 in Excel
fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));

I find this piece works for me :)

I know all my data is in UTF8 - wrong.
Encoding it's not the format of a file. So, check charset in headers of the page, where you taking data from:
header("Content-type: text/html; charset=utf-8;");
And check if data really in multi-byte encoding:
if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'not UTF-8';
else print 'utf-8';

$handle = fopen($file,"w");
fwrite($handle, pack("CCC",0xef,0xbb,0xbf));
fwrite($handle,$file); 
fclose($handle);

The only thing I had to do is add a UTF8 BOM to the CSV, the data was correct but the file reader (external application) couldn't read the file properly without the BOM

Try this simple method that is more useful and add to the top of the page before tag <body> :

<head>
  <meta charset="utf-8">
</head>
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!