policy for polling rss

前端 未结 8 2119
遥遥无期
遥遥无期 2020-12-22 20:39

I have an application that polls several rss sources on the web.

What is the etiquette when polling other\'s web servers. How frequently to poll, etc?

What a

相关标签:
8条回答
  • 2020-12-22 21:03
    1. Make use of HTTP cache. Send Etag and LastModified headers. Recognize 304 Not modified response. This way you can save a lot of bandwidth. Additionally some scripts recognize the LastModified header and return only partial contents (ie. only the two or three newest items instead of all 30 or so).

    2. Don’t poll RSS from services that supports RPC Ping (or other PUSH service, such as PubSubHubbub). I.e. if you’re receiving PUSH notifications from a service, you don’t have to poll the data in the standard interval — do it once a day to check if the mechanism still works or not (ping can be disabled, reconfigured, damaged, etc). This way you can fetch RSS only on receiving notification, not every hour or so.

    3. Check the TTL (in RSS) or cache control headers (Expires in ATOM), and don’t fetch until resource expires.

    4. Try to adapt to frequency of new items in each single RSS feed. If in the past week there were only two updates in particular feed, don’t fetch it more than once a day. AFAIR Google Reader does that.

    5. Lower the rate at night hours or other time when the traffic on your site is low.

    6. At last, do it once a hour. ;)

    0 讨论(0)
  • 2020-12-22 21:03

    Rss has a ttl setting in it so really you should only poll when the TTL expires.

    But I guess if they don't put one in its their problem and you should poll something like once an hour

    0 讨论(0)
  • 2020-12-22 21:13

    This is not a complete answer, but look for push alerts.

    The RSS blog indicates that a best practice is asking weblogs.com about changed blogs.

    There is also some, er, hubbub, about pubsub, a way to subscribe to push alerts that has some momentum.

    0 讨论(0)
  • 2020-12-22 21:14

    Once an hour, if you want to just go by rule-of-thumb (but the link explains some better options).

    0 讨论(0)
  • 2020-12-22 21:19

    I note that twitter uses (custom) X-RateLimit-Remaining and X-RateLimit-Limit headers (in HTTP response) to indicate the maximum number of authorised polls for Atom feeds. It's somehow a pity that they haven't used the standard Expires field (which is set 30 years in the past :P) I guess their advertising of Cache-Control: no-cache also rules out the generic heursitic expiration time defined in RFC 2616 (section 13.2.*). It's even more a pity that Atom doesn't seem to provide any standardised way to tell how often one is suggested to poll the feed.

    0 讨论(0)
  • 2020-12-22 21:22

    Once an hour is a frequency I've heard.

    0 讨论(0)
提交回复
热议问题