I am trying to get the first 20 results from a Google search.
When I use urllib2.urlopen()
it gives me an error and says I am forbidden.
I heard that it has something to do with faking you user agent string, but i have next to no urllib2 experience and would be very grateful if anyone could help.
Thanks, giodamelio
You should probably just use a library that does all the hard work.
xGoogle enables you to get the search results in a list
From the examples,
from xgoogle.search import GoogleSearch
gs = GoogleSearch("quick and dirty")
gs.results_per_page = 50
results = gs.get_results()
There are basically two ways - accessing Google's API directly, or using xGoogle package.
Gooogle's own API
Google's JSON\Atom API requires you to get an account and a key. It is the standard and preferred way to get searches in an automated way, which means you won't be banned from their service. The request is quite simple (quoting Google's own example):
GET https://www.googleapis.com/customsearch/v1?
key=INSERT-YOUR-KEY&cx=017576662512468239146:omuauf_lfve&q=lectures
You'll get a JSON response, which can be easily processed using numerous Python packages.
xGoogle
The xgoogle package is somewhat faster (see Lakshman Prasad's answer), but it might be blocked (or worse, get wrong or empty answers) by Google, causing you program to stop working.
Pros and cons
If you just need to get some searches done for a project, use xGoogle. If your program needs to last longer, and you don't your searches to get blocked, spend the 15 minutes required to use their API.
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
来源:https://stackoverflow.com/questions/4363861/returning-google-search-to-python