Scrape a web page that requires they give you a session cookie first

社会主义新天地 提交于 2019-11-28 19:57:43

Using requests this is a trivial task:

>>> url = 'http://httpbin.org/cookies/set/requests-is/awesome'
>>> r = requests.get(url)

>>> print r.cookies
{'requests-is': 'awesome'}

Using cookies and urllib2:

import cookielib
import urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# use opener to open different urls

You can use the same opener for several connections:

data = [opener.open(url).read() for url in urls]

Or install it globally:

urllib2.install_opener(opener)

In the latter case the rest of the code looks the same with or without cookies support:

data = [urllib2.urlopen(url).read() for url in urls]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!