Simple HTML DOM returning false

五迷三道 提交于 2019-12-13 19:28:07

问题


I've encountered something strange when using Simple HTML DOM to parse a webpage with a certain query string. Some query strings work when trying to parse this used car page of a dealership's website, however others do not. It seems to be that whenever there are more vehicles to be shown on the page, it will not return the HTML content (meaning if we are on the last page of pagination it will work, otherwise it won't). Just wondering if anyone has any ideas. I've tried viewing the page with javascript disabled to see if the markup is different, but it seems like the page behaves similarly. Below is code if anyone has any ideas... Or better yet solutions. Thanks all!

require ('simple_html_dom.php');
error_reporting(E_ALL);
$startingURL = 'http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=2';
$getHTML = file_get_html($startingURL);
if ($getHTML == true){
    echo '<h1>TRUE</h1>';
    var_dump($getHTML);
}
else {
    echo '<h1>FALSE</h1>';
    var_dump($getHTML);
}

When using var_dump with the above URL it returns a boolean false. When using the following URL, I can parse the data no issue - http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=5

Thanks.


回答1:


you should not use the default function file_get_html for getting remote content, that function use file_get_content to download page content. Sometime the target website will block your request by the user agent or referer. You could try PHP Curl to download page content first, then parse it with simple_html_dom



来源:https://stackoverflow.com/questions/35296312/simple-html-dom-returning-false

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!