Python requests arguments/dealing with api pagination

后端 未结 4 2373
时光取名叫无心
时光取名叫无心 2020-12-08 10:10

I\'m playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn\'t find an active Python wrapper for the api (if I make any h

相关标签:
4条回答
  • 2020-12-08 10:37

    I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.

    max_version = [1]
    while len(max_version) > 0:
        r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
        next_page = r['page']
        if next_page is not None:
            max_version[0] = next_page
            Process data...
        else:
            max_version.clear() # Stop the while loop
    
    0 讨论(0)
  • 2020-12-08 10:48

    Further improving on @dh762 's answer, you can use while and have all the requests done in it without having 2 yield statements.

    Eg:

    import requests
    
    session = requests.Session()
    
    def get_jobs():
        url = "https://api.angel.co/1/tags/1664/jobs"
        currP = 1
        totalP = 2 #assuming there's gonna be 2nd page, it'll get overwritten if not.
        while (currP <= totalP):
            page = session.get(url, params={'page': currP}).json()
            totalP = page['last_page']
            currP += 1
            yield page
    
    for page in get_jobs():
        # TODO: process the page
    
    0 讨论(0)
  • 2020-12-08 10:50

    Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.

    import requests
    
    session = requests.Session()
    
    def get_jobs():
        url = "https://api.angel.co/1/tags/1664/jobs" 
        first_page = session.get(url).json()
        yield first_page
        num_pages = first_page['last_page']
    
        for page in range(2, num_pages + 1):
            next_page = session.get(url, params={'page': page}).json()
            yield next_page
    
    for page in get_jobs():
        # TODO: process the page
    
    0 讨论(0)
  • 2020-12-08 10:51

    Read last_page and make a get request for each page in the range:

    import requests
    
    r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
    num_pages = r_sanfran['last_page']
    
    for page in range(2, num_pages + 1):
        r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
        print r_sanfran['page']
        # TODO: extract the data
    
    0 讨论(0)
提交回复
热议问题