Simple html dom character encoding issue

后端 未结 3 1563
灰色年华
灰色年华 2020-12-18 11:07

I am using simple html dom to retrieve content from another website, but the thing is theres a character encoding issue with the stuff retrieved using simple html dom. The c

3条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-18 11:30

    I had this problem too, but it was not the charset problem.It was gzip compression that simple html dom doesn't handle. Here is my solution. Use the function file_get_html2 instead file_get_html.

    function curl($url){
        $headers[]  = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
        $headers[]  = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        $headers[]  = "Accept-Language:en-us,en;q=0.5";
        $headers[]  = "Accept-Encoding:gzip,deflate";
        $headers[]  = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
        $headers[]  = "Keep-Alive:115";
        $headers[]  = "Connection:keep-alive";
        $headers[]  = "Cache-Control:max-age=0";
    
        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
        curl_setopt($curl, CURLOPT_ENCODING, "gzip");
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
        $data = curl_exec($curl);
        curl_close($curl);
        return $data;
    
    }
    function file_get_html2($url){
        return str_get_html(curl($url));
    }
    

提交回复
热议问题