Simple html dom file_get_html not working - is there any workaround?

扶醉桌前 提交于 2019-11-26 09:44:43

问题


<?php
// Report all PHP errors (see changelog)
error_reporting(E_ALL);

include(\'inc/simple_html_dom.php\');

    //base url
    $base = \'https://play.google.com/store/apps\';

    //home page HTML
    $html_base = file_get_html( $base );

    //get all category links
    foreach($html_base->find(\'a\') as $element) {
        echo \"<pre>\";
        print_r( $element->href );
        echo \"</pre>\";
    }

    $html_base->clear(); 
    unset($html_base);

?>

I have the above code and I\'m trying to get certain elements of the Play Store page but it isn\'t returning anything. Is it possible that certain PHP functions might be disabled on the server to stop that?

The above code works perfectly on other sites.

Is there any workaround?


回答1:


As I said, your example is working fine for me... But try this way using curl instead:

//base url
$base = 'https://play.google.com/store/apps';

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);

// Create a DOM object
$html_base = new simple_html_dom();
// Load HTML from a string
$html_base->load($str);

//get all category links
foreach($html_base->find('a') as $element) {
    echo "<pre>";
    print_r( $element->href );
    echo "</pre>";
}

$html_base->clear(); 
unset($html_base);

It gets all the links as expected:

And make sure you have php_openssl and php_curl installed...




回答2:


remove the semicolon from php.ini and restart Apache server to enable php module configuration

; Windows Extensions
...
;extension=php_openssl.dll
...



回答3:


You must set "allow_url_fopen" as TRUE in "php.ini" to allow accessing files via HTTP or FTP.
Some hosting venders disable PHP's "allow_url_fopen" flag for security issues.




回答4:


$post = curl_init(); 
curl_setopt($post, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($post, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($post, CURLOPT_HEADER, 0);
curl_setopt($post,CURLOPT_RETURNTRANSFER, true);
curl_setopt($post,CURLOPT_URL,$website);
curl_setopt($post,CURLOPT_POST,1);
curl_setopt($post,CURLOPT_POSTFIELDS,"regno=$Number");
curl_setopt($post, CURLOPT_FOLLOWLOCATION, True);
curl_getinfo($post, CURLINFO_HTTP_CODE);
$curlresponse = curl_exec($post);
curl_close($post);  
$dom = new DOMDocument();
$dom->loadHTML($curlresponse);

DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseStartTag: misplaced THIS IS URL : http://www.annauniv.edu/cgi-bin/result/cgrade.pl?regno=11210104001



来源:https://stackoverflow.com/questions/18667441/simple-html-dom-file-get-html-not-working-is-there-any-workaround

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!