Scrape a web page that requires they give you a session cookie first

后端未结

关注

 2  634

I\'m trying to scrape an excel file from a government \"muster roll\" database. However, the URL I have to access this excel file:

http://nrega.ap.gov.in/Nregs/Front

相关标签:

2条回答

野趣味

2020-12-13 23:20

Using requests this is a trivial task:

>>> url = 'http://httpbin.org/cookies/set/requests-is/awesome'
>>> r = requests.get(url)

>>> print r.cookies
{'requests-is': 'awesome'}

0 讨论(0)

一个人的身影

2020-12-13 23:29

Using cookies and urllib2:

import cookielib
import urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# use opener to open different urls

You can use the same opener for several connections:

data = [opener.open(url).read() for url in urls]

Or install it globally:

urllib2.install_opener(opener)

In the latter case the rest of the code looks the same with or without cookies support:

data = [urllib2.urlopen(url).read() for url in urls]

0 讨论(0)