Reddit API returning useless JSON

早过忘川 提交于 2021-01-27 05:38:32

问题


I'm trying to scrape new stories from Reddit using their API and Python's urllib2, but I keep getting JSON documents like this one:

{ u'kind': u'Listing', u'data': { u'modhash': u'', u'children': [], u'after': None, u'before': None }}

Here is my code:

import json
import time
import urllib2

def get_submissions(after=None):
    url = 'http://reddit.com/r/all/new.json?limit=100'
    if after:
        url += '&after=%s' % after

    _user_agent = 'Reddit Link Analysis Bot by PirateLogic @ github.com/jamesbrewer'
    _request = urllib2.Request(url, headers={'User-agent': _user_agent})
    _json = json.loads(urllib2.urlopen(_request).read())   

    return [story for story in _json['data']['children']], _json['data']['after']

if __name__ == '__main__':
    after = None
    stories = []
    limit = 1
    while len(stories) < limit:
        new_stories, after = get_submissions(after)
        stories.extend(new_stories)
        time.sleep(2) # The Reddit API allows one request every two seconds.
        print '%d stories collected so far .. sleeping for two seconds.' % len(stories)

What I've written is fairly short and straight-forward, but I'm obviously overlooking something or I don't have a complete understanding of the API or how urllib2 works.

Here's an example page from the API.

What's the deal?

EDIT After trying to load the example page in another browser, I'm also seeing the JSON I posted at the top of the page. It seems to be only for //new.json though. If I try //hot.json or just /.json, I get what I want.


回答1:


Edit: As of 2013/02/22, the desired new sort no longer requires sort=new to be added as a URL parameter. This is because the rising sort is no longer provided under the /new route, but is provided by /rising [source].


The problem with the URL http://reddit.com/r/all/new.json?limit=100 is that the new pages by default use the rising sort. If you are logged in, and you have changed the default sort to new then what you really see is the result for the page http://reddit.com/r/all/new.json?limit=100&sort=new. Notice the addition of the parameter sort=new.

Thus the result is correct, it is just that the rising view has not been updated for /r/all.

On a related note, I strongly suggest using PRAW (the python reddit API wrapper) rather than writing your own code if you plan to use more than just a single part of the API. Here's the relevant code that you want:

import praw
r = praw.Reddit('YOUR DESCRIPTIVE USER AGENT NAME')
listing = list(r.get_subreddit('all').get_new_by_date())
print listing

If you simply want to iterate over the submissions you can omit the list() part.




回答2:


I was stumped on a similar (not the same as OP) problem for a while - no children in the API response. I figured I'd post this in case it's helpful to others getting to this question via a search engine:

If I open this url in my browser:

https://www.reddit.com/comments.json?limit=100

It seems to work fine, but when I send a request it returns no children. Tried playing with the user-agent of the request and stuff like that to no avail. Ended up using the /r/all comment stream instead:

https://www.reddit.com/r/all/comments.json?limit=100

Works fine in the browser and via a programmatic request. Still have no idea why the first url doesn't work.




回答3:


http://www.reddit.com/r/all.json?limit=100 returns meaningful data

http://reddit.com/r/all/new?limit=100 (no .json) says there are no items...

It looks like reddit doesn't use /new how you think it does so the problem is in your use of the api.

If this answer is not sufficient please include a link to the reddit api docs.

Also, here's a quick note on REST. It looks like reddit is RESTful (I stand to be corrected but that's what my experiments here tell me...). This means that by dropping the .json extension on any of the urls you are trying to access should give you human-friendly versions of the same data. This could be useful during testing. Just look at stuff with your browser and you will see what info reddit thinks you are asking for.



来源:https://stackoverflow.com/questions/13328798/reddit-api-returning-useless-json

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!