file_get_contents() converts UTF-8 to ISO-8859-1

前端 未结 4 640
灰色年华
灰色年华 2020-12-15 02:10

I am trying to get search results from yahoo.com.

But file_get_contents() converts UTF-8 charset (charset, that yahoo uses) content to ISO-8859-1.<

相关标签:
4条回答
  • 2020-12-15 02:34
    $s2 = iconv("ISO-8859-1","UTF-8//TRANSLIT//IGNORE",$filename );
    

    Better solution...

    function curl($url){
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_ENCODING, 1);
        return curl_exec($ch);
        curl_close($ch);
    }
    
    echo curl($filename);
    
    0 讨论(0)
  • 2020-12-15 02:48

    file_get_contents should not change the charset. The data is pulled in as a binary string.

    When checking out the url you provided, this is the header it provides:

    Content-Type: text/html; charset=ISO-8859-1
    

    Also, in the body:

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
    

    Also, you can't convert UTF-8 losslessly convert to ISO-8859-1 and get the characters back when going back to UTF-8. UTF-8 / unicode supports many many more characters, so the characters are lost in the first step.

    In the browser this is not the case, so perhaps you just need to provide a correct Accept-Encoding header to instruct yahoo's system you can accept UTF-8.

    0 讨论(0)
  • 2020-12-15 02:55

    For anyone investigating on this:

    The time I spent on encoding issues taught me that rarely php functions "magically" change the encoding of strings. (One of these rare examples is :

    exec( $command, $output, $returnVal )

    Please note also that the working header set is as follows:

    header('Content-Type: text/html; charset=utf-8');

    and not:

    header('Content-Type: text/html; charset=UTF-8');

    As I had a similar issue as the one you describe, it was enough to set the headers properly.

    Hope this helps!

    0 讨论(0)
  • 2020-12-15 02:57

    This seems to be a content negotiation problem as file_get_contents probably sends a request that only accepts ISO 8859-1 as character encoding.

    You can create a custom stream context for file_get_contents using stream_context_create that explicitly states that you accept UTF-8:

    $opts = array('http' => array('header' => 'Accept-Charset: UTF-8, *;q=0'));
    $context = stream_context_create($opts);
    
    $filename = "http://search.yahoo.com/search;_ylt=A0oG7lpgGp9NTSYAiQBXNyoA?p=naj%C5%A1%C5%A5astnej%C5%A1%C3%AD&fr2=sb-top&fr=yfp-t-701&type_param=&rd=pref";
    echo file_get_contents($filename, false, $context);
    
    0 讨论(0)
提交回复
热议问题