How to get the feed URL(s) from a website?

这一生的挚爱 提交于 2021-02-10 16:09:19

问题


As per the official documentation, properly setup websites should indicate the URL of their RSS / Atom feed(s) when asked politely:

GET / HTTP/1.1
Host: example.com
Accept: application/rss+xml, application/xhtml+xml, text/html

When an HTTP server (or server-side script) gets this, it should redirect the HTTP client to the feed. It should do this with an HTTP 302 Found. Something like:

HTTP/1.1 302 Found
Location: http://example.com/feed

I'm trying to get this response, without luck:

request(
  { method: 'GET',
    url: 'https://stackoverflow.com',
    followRedirect :false,
    accept: ['application/rss+xml', 'application/xhtml+xml', 'text/html']
  }, function (error, response, body) {
    console.log('statusCode: ', response.statusCode);
  }
);

Yelds

statusCode: 200

How do I formulate my request so that the website responds with the feed URL(s)?


回答1:


It is not common practice for websites to send back their RSS feed from an HTTP request to the home page asking for an application/rss+xml MIME type in the Accept header. That documentation on Mozilla you've linked is a suggestion I've never seen before after many years involvement in RSS as a developer.

A more established and widely adopted method for a site to identify its RSS feed is a technique called RSS Autodiscovery. Open the site's home page and look for this tag in the HEAD section:

<link rel="alternate" type="application/rss+xml" title="RSS"
    href="http://feeds.example.com/rss-feed">

The type attribute can be any of the MIME types for RSS, Atom or JSONFeed feeds.




回答2:


The material you quote is prefixed with:

Although this advanced technique for syndication is not required, support of this is recommended, especially for web sites and applications with high performance needs.

If you get HTML back, then you should construct a DOM with an HTML parser and then search it for the appropriate <link> element as described in an earlier section of that page.



来源:https://stackoverflow.com/questions/49479712/how-to-get-the-feed-urls-from-a-website

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!