Retrieving RSS posts older than those included in feed

走远了吗. 提交于 2019-12-11 03:59:31

问题


When creating an RSS reader, you download the XML formatted document pointed to by the RSS feed link, and you can parse it manually or using the functionality in the SyndicationFeed namespace.

So if we take Scott Guthrie's blog as an example, you download the RSS feed document here, and parse it. My problem is that this document only holds 15 items, yet he has been blogging for a number of years.

Is there a standard or established way of getting the older posts not included in the RSS feed document? Or do you have to find the base address for the blog posts and then parse the pages of the site from there to get them? How do you avoid missing posts on high volume blogs?


回答1:


With RSS/Atom you can't query older articles.

I built a RSS archival service (https://app.pub.center). All of our data is free to use via REST. We charge money for push notifications.

PubCenter daily polls it's catalog of RSS feeds, and caches the articles. Then, you can get these articles back in a chronological order. For example:

Page 1 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=1

Page 2 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=2




回答2:


As the replies to How Do I Fetch All Old Items on an RSS Feed? already mentioned, a feed may not provide archival data but historical items may be available from another source.

Archive.org’s Wayback Machine has an API to access historical content, including RSS feeds (if their bots have downloaded it). I’ve created the web tool Backfeed that uses this API to regenerate a feed containing concatenated historical items. If you'd like to discuss the implementation in detail please get in touch.



来源:https://stackoverflow.com/questions/5761954/retrieving-rss-posts-older-than-those-included-in-feed

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!