Python Requests Stream Data from API

非 Y 不嫁゛ 提交于 2021-01-03 06:48:05

问题


Use Case: I am trying to connect to a streaming API, ingest those events, filter them and save relevant ones.

Issue: My code works well until about 1100th response. After this point the code doesn't crash but it seems to stop pulling more data from the stream. I am guessing it is some sort of buffer issue, but honestly streaming is new to me and I have no idea what is causing the issue.

Code

import requests
def stream():
    s = requests.Session()
    r = s.get(url, headers=headers, stream=True)
    for line in r.iter_lines():
        if line:
            print(line)

I have also tried this without a session object and I get the same results.

Is there a parameter I am overlooking or a concept I am not aware of? I have scoured the docs/interwebs and nothing is jumping out at me.

Any help is much appreciated.

EDIT Everything looks correct on my end I think that the stream just generates a ton of events upon initial connection, then they slow way down. The issue now however, is that after just a few minutes connected I am getting this error:

Traceback (most recent call last):
  File "C:\Users\joe\PycharmProjects\proj\venv\lib\site-packages\urllib3\response.py", line 572, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

回答1:


Follow the "Body Content Workflow" (requests library) section guidlines for streaming data.

Sample approach:

import requests

def get_stream(url):
    s = requests.Session()

    with s.get(url, headers=None, stream=True) as resp:
        for line in resp.iter_lines():
            if line:
                print(line)

url = 'https://jsonplaceholder.typicode.com/posts/1'
get_stream(url)

The output:

b'{'
b'  "userId": 1,'
b'  "id": 1,'
b'  "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",'
b'  "body": "quia et suscipit\\nsuscipit recusandae consequuntur expedita et cum\\nreprehenderit molestiae ut ut quas totam\\nnostrum rerum est autem sunt rem eveniet architecto"'
b'}'



回答2:


You might be getting rate-limited. Trying printing the status code of the request object.

For example, in your code:

import requests
def stream():
    s = requests.Session()
    r = s.get(url, headers=headers, stream=True)
    print(r.status_code)
    for line in r.iter_lines():
        if line:
            print(line)

Run this until you get to the 1100th response. It is possible that the service you are making calls to has a rate-limit. If you get a 429 response, that means that you must wait a while to continue making calls.



来源:https://stackoverflow.com/questions/57497833/python-requests-stream-data-from-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!