问题
I cannot get cyrillic characters in php from a .txt file with unknown encoding. I tried almost everything I could find on the web. What php function do I need to use get the contents of this file?
https://www.dropbox.com/s/w7cex4wiogyytvm/100004-6.txt
EDIT
Input:
$path = WWW_ROOT . 'files' . DS . '100002-6.txt';
$string = file_get_contents($path);
debug($string);
Output: debug is broken, if I try to save the value to database it fails (BOM does some trouble and the value cannot be saved).
Input
$path = WWW_ROOT . 'files' . DS . '100002-6.txt';
$string = file_get_contents($path);
$string = mb_convert_encoding ($string , 'utf-8');
debug($string);
Output:
'????? ???:300/500V
???? ???:2000V
????? ???? ??????: ? +70??
?? ??? ?? (????? 5 ??.): ? +160??
????? ?????? ?? ?????: ? +5?? '
Input:
$path = WWW_ROOT . 'files' . DS . '100002-6.txt';
$string = file_get_contents($path);
$string = iconv("UTF-16", "UTF-8//TRANSLIT//IGNORE", $string);
debug($string);
Output:
췮㌰〯㔰ざഊ죱㈰〰嘍્⃰㨠㜰냑ഊ쿰밠⣭㔠⤺⃤⬱㘰냑ഊ췠볭
Input:
$path = WWW_ROOT . 'files' . DS . '100002-6.txt';
$string = file_get_contents($path);
$string = iconv("ISO-8859-5", "UTF-8//TRANSLIT//IGNORE", $string);
debug($string);
Output:
Эюьшэрыхэ эряюэ:300/500V
Шёяшђхэ эряюэ:2000V
ЭрМтшёюър №рсюђэр ђхьях№рђѓ№р: фю +70Аб
Я№ш ъ№рђюъ ёяюМ (эрМьэюуѓ 5 ёхъ.): фю +160Аб
ЭрМэшёър ђхьях№рђѓ№р я№ш шэёђрырішМр: фю +5Аб
Now that I tested multiple files, I don't think the input file is Unicode encoded anymore. I succeeded on reading my test file, but on the one that matters (and I don't know the encoding of) still nothing. So I changed the question, the encoding seems to be undefined still.
A little bit more for clearance. I can open this file and see it normally in notepad. It contains cyrillic characters that make this problem.
回答1:
The file is encoded in CP1251 a.k.a. MS-CYRL a.k.a. "Cyrillic (Windows)".
$string = file_get_contents($path);
$string = iconv('CP1251', 'UTF-8', $string);
How did I figure this out? Opened the file in a text editor and tried a few relevant encodings until it looked right. There's hardly anything else you can do if the file encoding is unknown.
来源:https://stackoverflow.com/questions/22963377/file-get-contents-on-file-with-cyrillic-characters-and-undefined-encoding