PHP Curl UTF-8 Charset

后端 未结 6 1848
囚心锁ツ
囚心锁ツ 2020-12-05 06:54

I have an php script which calls another web page and writes all the html of the page and everything goes ok however there is a charset problem. My php file encoding is utf-

相关标签:
6条回答
  • 2020-12-05 07:29

    You Can use this header

       header('Content-type: text/html; charset=UTF-8');
    

    and after decoding the string

     $page = utf8_decode(curl_exec($ch));
    

    It worked for me

    0 讨论(0)
  • 2020-12-05 07:33

    First method (internal function)

    The best way I have tried before is to use urlencode(). Keep in mind, don't use it for the whole url; instead, use it only for the needed parts. For example, a request that has two 'text-fa' and 'text-en' fields and they contain a Persian and an English text, respectively, you might only need to encode the Persian text, not the English one.

    Second Method (using cURL function)

    However, there are better ways if the range of characters have to be encoded is more limited. One of these ways is using CURLOPT_ENCODING, by passing it to curl_setopt():

    curl_setopt($ch, CURLOPT_ENCODING, "");
    
    0 讨论(0)
  • 2020-12-05 07:45
    function page_title($val){
        include(dirname(__FILE__).'/simple_html_dom.php');
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL,$val);
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
        curl_setopt($ch, CURLOPT_ENCODING , "gzip");
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        $return = curl_exec($ch); 
        $encot = false;
        $charset = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
    
        curl_close($ch); 
        $html = str_get_html('"'.$return.'"');
    
        if(strpos($charset,'charset=') !== false) {
            $c = str_replace("text/html; charset=","",$charset);
            $encot = true;
        }
        else {
            $lookat=$html->find('meta[http-equiv=Content-Type]',0);
            $chrst = $lookat->content;
            preg_match('/charset=(.+)/', $chrst, $found);
            $p = trim($found[1]);
            if(!empty($p) && $p != "")
            {
                $c = $p;
                $encot = true;
            }
        }
        $title = $html->find('title')[0]->innertext;
        if($encot == true && $c != 'utf-8' && $c != 'UTF-8') $title = mb_convert_encoding($title,'UTF-8',$c);
    
        return $title;
    }
    
    0 讨论(0)
  • 2020-12-05 07:49

    Simple: When you use curl it encodes the string to utf-8 you just need to decode them..

    Description
    
    string utf8_decode ( string $data )
    

    This function decodes data , assumed to be UTF-8 encoded, to ISO-8859-1.

    0 讨论(0)
  • 2020-12-05 07:49
    $output = curl_exec($ch);
    $result = iconv("Windows-1251", "UTF-8", $output);
    
    0 讨论(0)
  • 2020-12-05 07:53

    I was fetching a windows-1252 encoded file via cURL and the mb_detect_encoding(curl_exec($ch)); returned UTF-8. Tried utf8_encode(curl_exec($ch)); and the characters were correct.

    0 讨论(0)
提交回复
热议问题