curl not working for getting a web page content, why?

喜你入骨 提交于 2019-12-11 03:58:22

问题


i am using a curl script to go to a link and get its content for further manipulation. following is the link and curl script:

<?php 
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&amp;templateName=detail.htm&amp;requestingHandler=WebNSORDetailHandler&amp;ID=368343543';

//curl script to get content of given url

$ch = curl_init();

// set the target url

curl_setopt($ch, CURLOPT_URL,$url);

// request as if Firefox

curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;
?>

but the website is not excepting it through script it is giving user exception in result, but if we normally paste the url in browser it is opening the page perfectly alright.

Please help, what i am doing wrong here.

Thanks and regards


回答1:


I ran the following program/script and the page was downloaded correctly. This most likely means the server you're running your script from can't reach the server at "criminaljustice.state.ny.us". This is either because your server is mis-configured, or their server is explicitly blocking you, which is a common result of aggressive screen scraping.

<?php
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;

Additional troubleshooting tip -- if you have shell access to the machine your PHP script is running from, run the following command

curl -I 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543'

This will output the response headers, which may contain some clue as to why your request is failing.




回答2:


For useragent i think you want to use the CURLOPT_USERAGENT constant

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");



回答3:


I had the same issue which ended up being the followlocation option not being set. I thought curl would set it to true by default but I guess not!? Once I set it it got the full site no problem




回答4:


Is the user agent meant to be in an array like that? I haven't seen it done like that before.

Try just using a plain string, i.e.

curl_setopt($ch, CURLOPT_HTTPHEADER, 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15'); 


来源:https://stackoverflow.com/questions/814149/curl-not-working-for-getting-a-web-page-content-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!