file_get_contents script works with some websites but not others

前端 未结 3 1435
轻奢々
轻奢々 2021-01-01 04:24

I\'m looking to build a PHP script that parses HTML for particular tags. I\'ve been using this code block, adapted from this tutorial:



        
3条回答
  •  旧巷少年郎
    2021-01-01 05:04

    $html = file_get_html('http://google.com/');
    $title = $html->find('title')->innertext;
    

    Or if you prefer with preg_match and you should be really using cURL instead of fgc...

    function curl($url){
    
        $headers[]  = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
        $headers[]  = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        $headers[]  = "Accept-Language:en-us,en;q=0.5";
        $headers[]  = "Accept-Encoding:gzip,deflate";
        $headers[]  = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
        $headers[]  = "Keep-Alive:115";
        $headers[]  = "Connection:keep-alive";
        $headers[]  = "Cache-Control:max-age=0";
    
        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
        curl_setopt($curl, CURLOPT_ENCODING, "gzip");
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
        $data = curl_exec($curl);
        curl_close($curl);
        return $data;
    
    }
    
    
    $data = curl('http://www.google.com');
    $regex = '#(.*?)#mis';
    preg_match($regex,$data,$match);
    var_dump($match); 
    echo $match[1];
    

提交回复
热议问题