Python requests arguments/dealing with api pagination

后端未结

关注

 4  2373

I\'m playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn\'t find an active Python wrapper for the api (if I make any h

相关标签:

4条回答

无人共我

2020-12-08 10:37
I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.
```
max_version = [1]
while len(max_version) > 0:
    r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
    next_page = r['page']
    if next_page is not None:
        max_version[0] = next_page
        Process data...
    else:
        max_version.clear() # Stop the while loop
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

迷失自我

2020-12-08 10:48

Further improving on @dh762 's answer, you can use while and have all the requests done in it without having 2 yield statements.

Eg:

import requests

session = requests.Session()

def get_jobs():
    url = "https://api.angel.co/1/tags/1664/jobs"
    currP = 1
    totalP = 2 #assuming there's gonna be 2nd page, it'll get overwritten if not.
    while (currP <= totalP):
        page = session.get(url, params={'page': currP}).json()
        totalP = page['last_page']
        currP += 1
        yield page

for page in get_jobs():
    # TODO: process the page

0 讨论(0)

温柔的废话

2020-12-08 10:50

Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.

import requests

session = requests.Session()

def get_jobs():
    url = "https://api.angel.co/1/tags/1664/jobs" 
    first_page = session.get(url).json()
    yield first_page
    num_pages = first_page['last_page']

    for page in range(2, num_pages + 1):
        next_page = session.get(url, params={'page': page}).json()
        yield next_page

for page in get_jobs():
    # TODO: process the page

0 讨论(0)

孤独总比滥情好

2020-12-08 10:51

Read last_page and make a get request for each page in the range:

import requests

r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
num_pages = r_sanfran['last_page']

for page in range(2, num_pages + 1):
    r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
    print r_sanfran['page']
    # TODO: extract the data

0 讨论(0)