问题
This script works fine when getting google.com but not with google.com/search?q=test. When I don't use CURLOPT_FOLLOWLOCATION, I get a 302 Moved. When I do use it, I get a page asking me to input a captcha. I've tried several different U.S. based proxies and have varied the user agent string. Is there something I'm missing here?
function my_fetch($url,$proxy,$user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8')
{
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_PROXY, $proxy);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt ($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec ($ch);
curl_close ($ch);
return $result;
}
$url = 'http://www.google.com/search?q=test';
$proxy = '152.26.53.4:80';
echo my_fetch($url,$proxy);
Please don't respond with suggestions to use the API instead. The API is not sufficient for my needs.
回答1:
Google is No Longer for cURL.
Google is no longer giving access through Curl, it may gives you 302 Moved message, If you want to use it, you have to use the API for it.
Thanks
回答2:
You can try to do that with PhantomJS:
var page = require("webpage").create();
var homePage = "http://www.google.com/";
page.open(homePage);
page.onLoadFinished = function(status) {
var url = page.url;
console.log("Status: " + status);
console.log("Loaded: " + url);
page.includeJs("http://code.jquery.com/jquery-1.8.3.min.js", function() {
console.log("Loaded jQuery!");
page.evaluate(function() {
var searchBox = $(".lst");
var searchForm = $("form");
searchBox.val("your query");
searchForm.submit();
});
});
window.setTimeout(
function () {
page.render( 'google.png' );
phantom.exit(0);
},
1000 // wait 5,000ms (5s)
);
};
来源:https://stackoverflow.com/questions/8609962/having-trouble-using-curl-and-php-to-get-google-search-results-through-a-proxy