What is the practical difference between these two ways of making web connections in Python?

若如初见. 提交于 2019-12-03 21:29:10

Under the hood, requests uses urllib3 to do most of the http heavy lifting. When used properly, it should be mostly the same unless you need more advanced configuration.

Except, in your particular example they're not the same:

In the urllib3 example, you're re-using connections whereas in the requests example you're not re-using connections. Here's how you can tell:

>>> import requests
>>> requests.packages.urllib3.add_stderr_logger()
2016-04-29 11:43:42,086 DEBUG Added a stderr logging handler to logger: requests.packages.urllib3
>>> requests.get('https://www.google.com/')
2016-04-29 11:45:59,043 INFO Starting new HTTPS connection (1): www.google.com
2016-04-29 11:45:59,158 DEBUG "GET / HTTP/1.1" 200 None
>>> requests.get('https://www.google.com/')
2016-04-29 11:45:59,815 INFO Starting new HTTPS connection (1): www.google.com
2016-04-29 11:45:59,925 DEBUG "GET / HTTP/1.1" 200 None

To start re-using connections like in a urllib3 PoolManager, you need to make a requests session.

>>> session = requests.session()
>>> session.get('https://www.google.com/')
2016-04-29 11:46:49,649 INFO Starting new HTTPS connection (1): www.google.com
2016-04-29 11:46:49,771 DEBUG "GET / HTTP/1.1" 200 None
>>> session.get('https://www.google.com/')
2016-04-29 11:46:50,548 DEBUG "GET / HTTP/1.1" 200 None

Now it's equivalent to what you were doing with http = PoolManager(). One more note: urllib3 is a lower-level more explicit library, so you explicitly create a pool and you'll explicitly need to specify your SSL certificate location, for example. It's an extra line or two of more work but also a fair bit more control if that's what you're looking for.

All said and done, the comparison becomes:

1) Using urllib3:

import urllib3, certifi
http = urllib3.PoolManager(ca_certs=certifi.where())
html = http.request('GET', url).read()
soup = BeautifulSoup(html, "html5lib")

2) Using requests:

import requests
session = requests.session()
html = session.get(url).content
soup = BeautifulSoup(html, "html5lib")
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!