I\'m trying to scrape an excel file from a government \"muster roll\" database. However, the URL I have to access this excel file:
http://nrega.ap.gov.in/Nregs/Front
Using requests this is a trivial task:
>>> url = 'http://httpbin.org/cookies/set/requests-is/awesome'
>>> r = requests.get(url)
>>> print r.cookies
{'requests-is': 'awesome'}
Using cookies and urllib2
:
import cookielib
import urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# use opener to open different urls
You can use the same opener for several connections:
data = [opener.open(url).read() for url in urls]
Or install it globally:
urllib2.install_opener(opener)
In the latter case the rest of the code looks the same with or without cookies support:
data = [urllib2.urlopen(url).read() for url in urls]