Simple html dom character encoding issue

拜拜、爱过 提交于 2019-11-28 01:32:07

问题


hey guys, i'm using simple html dom to retrieve content from another website, but the thing is theres a character encoding issue with the stuff retrieved using simple html dom. The characters are showing up as the little diamond with the question mark inside.

The character encoding issue only happens with the content retrieved, and all other text on my site is displaying fine.

If anyone could help that would be great.


回答1:


Try using iconv to convert the charset of the scraped text to the charset you use on your page.

Signature:

string iconv ( string $in_charset , string $out_charset , string $str )

Example:

echo iconv("ISO-8859-1", "UTF-8", $text);



回答2:


I had this problem too, but it was not the charset problem.It was gzip compression that simple html dom doesn't handle. Here is my solution. Use the function file_get_html2 instead file_get_html.

function curl($url){
    $headers[]  = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
    $headers[]  = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    $headers[]  = "Accept-Language:en-us,en;q=0.5";
    $headers[]  = "Accept-Encoding:gzip,deflate";
    $headers[]  = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $headers[]  = "Keep-Alive:115";
    $headers[]  = "Connection:keep-alive";
    $headers[]  = "Cache-Control:max-age=0";

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($curl, CURLOPT_ENCODING, "gzip");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
    $data = curl_exec($curl);
    curl_close($curl);
    return $data;

}
function file_get_html2($url){
    return str_get_html(curl($url));
}



回答3:


Go to website and check their charset by viewing page info.

$text = iconv(mb_detect_encoding($text), "UTF-8//TRANSLIT//IGNORE", $text);


来源:https://stackoverflow.com/questions/4550903/simple-html-dom-character-encoding-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!