Understanding requests versus grequests

送分小仙女□ 提交于 2019-12-21 03:55:18

问题


I'm working with a process which is basically as follows:

  1. Take some list of urls.
  2. Get a Response object from each.
  3. Create a BeautifulSoup object from the text of each Response.
  4. Pull the text of a certain tag from that BeautifulSoup object.

From my understanding, this seems ideal for grequests:

GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

But yet, the two processes (one with requests, one with grequests) seem to be getting me different results, with some of the requests in grequests returning None rather than a response.

Using requests

import requests

tickers = [
    'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI', 
    'ADM',  'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN', 
    'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE',
    ]

BASE = 'https://finance.google.com/finance?q={}'

rs = (requests.get(u) for u in [BASE.format(t) for t in tickers])
rs = list(rs)

rs
# [<Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # ...
 # <Response [200]>]

# All are okay (status_code == 200)

Using grequests

# Restarted my interpreter and redefined `tickers` and `BASE`
import grequests

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rs = grequests.map(rs)

rs
# [None,
 # <Response [200]>,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>]

Why the difference in results?

Update: I can print the exception type as follows. Related discussion here but I have no idea what's going on.

def exception_handler(request, exception):
    print(exception)

rs = grequests.map(rs, exception_handler=exception_handler)

# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)

System/version info

  • requests: 2.18.4
  • grequests: 0.3.0
  • Python: 3.6.3
  • urllib3: 1.22
  • pyopenssl: 17.2.0
  • All via Anaconda
  • System: same issue on both Mac OSX HS & Windows 10, build 10.0.16299

回答1:


You are just sending requests too fast. As grequests is an async lib, all of these requests are almost sent simultaneously. They are too many.

You just need to limit the concurrent tasks by grequests.map(rs, size=your_choice), I have tested grequests.map(rs, size=10) and it works well.




回答2:


I do not know the exact reason for the observed behavior with .map(). However, using the .imap() function with size=1 always returned a 'Response 200' for my few minutes testing. Here is the code snipet:

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1)
rsm_list = [r for r in rsm_iterator]
print(rsm_list)

And if you don't want to wait for all requests to finish before working on their answers, you can do this like so:

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1)
for r in rsm_iterator:
    print(r)


来源:https://stackoverflow.com/questions/46205491/understanding-requests-versus-grequests

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!