file_get_contents on file with cyrillic characters and undefined encoding

半腔热情 提交于 2019-12-11 19:52:38

问题


I cannot get cyrillic characters in php from a .txt file with unknown encoding. I tried almost everything I could find on the web. What php function do I need to use get the contents of this file?

https://www.dropbox.com/s/w7cex4wiogyytvm/100004-6.txt

EDIT

Input:

    $path = WWW_ROOT . 'files' . DS . '100002-6.txt';
    $string = file_get_contents($path);
    debug($string);

Output: debug is broken, if I try to save the value to database it fails (BOM does some trouble and the value cannot be saved).

Input

    $path = WWW_ROOT . 'files' . DS . '100002-6.txt';
    $string = file_get_contents($path);
    $string = mb_convert_encoding ($string , 'utf-8');
    debug($string);

Output:

    '????? ???:300/500V
    ???? ???:2000V
    ????? ???? ??????: ? +70??
    ?? ??? ?? (????? 5 ??.): ? +160??
    ????? ?????? ?? ?????: ? +5??   '

Input:

    $path = WWW_ROOT . 'files' . DS . '100002-6.txt';
    $string = file_get_contents($path);
    $string = iconv("UTF-16", "UTF-8//TRANSLIT//IGNORE", $string);
    debug($string);

Output:

췮㌰〯㔰ざഊ죱㈰〰嘍્⃰⃲㨠‫㜰냑ഊ쿰⃱밠⣭㔠⤺⃤⬱㘰냑ഊ췠볭

Input:

    $path = WWW_ROOT . 'files' . DS . '100002-6.txt';
    $string = file_get_contents($path);
    $string = iconv("ISO-8859-5", "UTF-8//TRANSLIT//IGNORE", $string);
    debug($string);

Output:

    Эюьшэрыхэ эряюэ:300/500V
    Шёяшђхэ эряюэ:2000V
    ЭрМтшёюър №рсюђэр ђхьях№рђѓ№р: фю +70Аб
    Я№ш ъ№рђюъ ёяюМ (эрМьэюуѓ 5 ёхъ.): фю +160Аб
    ЭрМэшёър ђхьях№рђѓ№р я№ш шэёђрырішМр: фю +5Аб

Now that I tested multiple files, I don't think the input file is Unicode encoded anymore. I succeeded on reading my test file, but on the one that matters (and I don't know the encoding of) still nothing. So I changed the question, the encoding seems to be undefined still.

A little bit more for clearance. I can open this file and see it normally in notepad. It contains cyrillic characters that make this problem.


回答1:


The file is encoded in CP1251 a.k.a. MS-CYRL a.k.a. "Cyrillic (Windows)".

$string = file_get_contents($path);
$string = iconv('CP1251', 'UTF-8', $string);

How did I figure this out? Opened the file in a text editor and tried a few relevant encodings until it looked right. There's hardly anything else you can do if the file encoding is unknown.



来源:https://stackoverflow.com/questions/22963377/file-get-contents-on-file-with-cyrillic-characters-and-undefined-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!