atom-feed

Automatically Extracting feed links (atom, rss,etc) from webpages [closed]

老子叫甜甜 提交于 2020-01-01 00:12:08
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I have a huge list of URLs and my task is to feed them to a python script which should spit out the feed urls if there are any. Is there an API library or code out there that can help? 回答1: I second waffle paradox in recommending Beautiful Soup for parsing the HTML and then getting the <link rel="alternate"> tags,

Automatically Extracting feed links (atom, rss,etc) from webpages [closed]

独自空忆成欢 提交于 2020-01-01 00:12:08
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I have a huge list of URLs and my task is to feed them to a python script which should spit out the feed urls if there are any. Is there an API library or code out there that can help? 回答1: I second waffle paradox in recommending Beautiful Soup for parsing the HTML and then getting the <link rel="alternate"> tags,

How to detect if a page is an RSS or ATOM feed

落爺英雄遲暮 提交于 2019-12-30 10:59:48
问题 I'm currently building a new online Feed Reader in PHP. One of the features i'm working on is feed auto-discovery. If a user enters a website URL, the script will detect that its not a feed and look for the real feed URL by parsing the HTML for the proper tag. The problem is, the way im currently detecting if the URL is a feed or a website only works part of the time, and I know it can't be the best solution. Right now im taking the CURL response and running it through simplexml_load_string,

'OR' operator in XPath predicate?

北城余情 提交于 2019-12-30 02:36:11
问题 What is the XPath expression to select <link> elements with type="application/rss+xml" OR type="application/atom+xml" (RSS and Atom feeds) link[@rel='alternate'][@type='application/rss+xml'] selects RSS feeds link[@rel='alternate'][@type='application/atom+xml'] selects Atom feeds But what is the single XPath expression for selecting them both? Thank you. 回答1: use: link[@rel='alternate'][@type='application/rss+xml' or @type='application/atom+xml'] see http://www.w3.org/TR/xpath/#NT-OrExpr You

Sax parsing and encoding

时光怂恿深爱的人放手 提交于 2019-12-28 18:04:24
问题 I have a contact that is experiencing trouble with SAX when parsing RSS and Atom files. According to him, it's as if text coming from the Item elements is truncated at an apostrophe or sometimes an accented character. There seems to be a problem with encoding too. I've given SAX a try and I have some truncating taking place too but haven't been able to dig further. I'd appreciate some suggestions if someone out there has tackled this before. This is the code that's being used in the

SelectNodes not working on stackoverflow feed

試著忘記壹切 提交于 2019-12-28 04:18:04
问题 I'm trying to add support for stackoverflow feeds in my rss reader but SelectNodes and SelectSingleNode have no effect. This is probably something to do with ATOM and xml namespaces that I just don't understand yet. I have gotten it to work by removing all attributes from the feed tag, but that's a hack and I would like to do it properly. So, how do you use SelectNodes with atom feeds? Here's a snippet of the feed. <?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005

Is there a tool for parsing feeds in Django

跟風遠走 提交于 2019-12-24 14:25:12
问题 I did some googling and didn't find anything complete for my problem, but it is so generic, there has to be something. I need feed parsing tool for my Django app (i want to fetch atom feed from somewhere and store its contents). I just found some feedparser.py references but the actual site is a long gone. Could you provide some pointers? 回答1: feedparser is still pretty much the canonical solution for this in Python. It's very far from gone: see the documentation here and the actual project

STR_TO_DATE and ISO8601 Atomtime format

落爺英雄遲暮 提交于 2019-12-24 10:51:02
问题 I have a MySQL database, which I cannot alter, where I read date from. The issue is that I have a varchar column that stores a date. The date is stored in the atomtime format eg. 2014-06-01T00:00:00+02:00 . I cannot figure how to specify the format in the STR_TO_DATE function. I tried STR_TO_DATE(Endtime, '%Y-%m-%dT%H:%i:%s+02:00') , but that doesn't work. Do anyone have a solution for this? I am trying to run the following query (which is not working properly): SELECT *, COUNT(*) as antal

Wordpress RSS feed with custom description?

邮差的信 提交于 2019-12-24 09:13:53
问题 Is there a nice and simple way to add a filter to the wordpress RSS feed functions? I want to insert some custom text into the <description> tag of my RSS2 feed and the <summary> tag of my Atom feed. Is there any easy way to do that? I don't have templates for my feeds in my theme (like wp-rss2.php or wp-atom.php). I've just added the normal <link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> Atom Feed" href="<?php bloginfo('atom_url'); ?>" /> to my <head> Any

Trouble parsing XML using PhP SimpleXMLElement

孤人 提交于 2019-12-23 05:16:17
问题 I am working with Google's provisioning API and I am using PhP's SimpleXmlElement to parse the XML response. SimpleXmlElement isn't parsing the response correctly. Here's a sample. <?php $xml_response = <<<EOD <?xml version='1.0' encoding='UTF-8'?> <entry xmlns='http://www.w3.org/2005/Atom' xmlns:apps='http://schemas.google.com/apps/2006'> <id>https://apps-apis.google.com/a/feeds/alias/2.0/gethelp_example.com/helpdesk%40gethelp%5Fexample.com</id> <updated>2014-05-06T00:53:35.817Z</updated>