feedparser

Get Feeds from FeedParser and Import to Pandas DataFrame

孤街浪徒 提交于 2021-02-05 20:36:45
问题 I'm learning python. As practice I'm building a rss scraper with feedparser putting the output into a pandas dataframe and trying to mine with NLTK...but I'm first getting a list of articles from multiple RSS feeds. I used this post on how to pass multiple feeds and combined it with an answer I got previously to another question on how to get it into a Pandas dataframe. What the problem is, I want to be able to see the data from all the feeds in my dataframe. Currently I'm only able to access

How to detect if a page is an RSS or ATOM feed

落爺英雄遲暮 提交于 2019-12-30 10:59:48
问题 I'm currently building a new online Feed Reader in PHP. One of the features i'm working on is feed auto-discovery. If a user enters a website URL, the script will detect that its not a feed and look for the real feed URL by parsing the HTML for the proper tag. The problem is, the way im currently detecting if the URL is a feed or a website only works part of the time, and I know it can't be the best solution. Right now im taking the CURL response and running it through simplexml_load_string,

Django rss feedparser returns a feed with no “title”

耗尽温柔 提交于 2019-12-22 06:35:25
问题 I'm writing a basic RSS feed reader in Django. I have a form in which a user submits a rss feed, and I add it to his feeds list. But for some reason, I'm unable to extract basic information about the feed using feed parser. when i run the following code: def form_valid(self, form): user = self.request.user link = form.cleaned_data['link'] feed = feedparser.parse(link).feed title = feed.title try: feed_obj = Feed.objects.get(link=link) except ObjectDoesNotExist: feed_obj = Feed(link=link,

How to parse the “<media:group>” using feedparser?

為{幸葍}努か 提交于 2019-12-20 16:48:52
问题 The rss file is shown as below, i want to get the content in section media:group . I check the document of feedparser, but it seems not mention this. How to do it? Any help is appreciated. <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:ymusic="http://music.yahoo.com/rss/1.0/ymusic/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" xmlns:dc="http://purl.org/dc/elements/1.1/" version

How to parse the “<media:group>” using feedparser?

白昼怎懂夜的黑 提交于 2019-12-20 16:48:10
问题 The rss file is shown as below, i want to get the content in section media:group . I check the document of feedparser, but it seems not mention this. How to do it? Any help is appreciated. <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:ymusic="http://music.yahoo.com/rss/1.0/ymusic/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" xmlns:dc="http://purl.org/dc/elements/1.1/" version

Parsing different date formats from feedparser in python?

十年热恋 提交于 2019-12-20 12:47:08
问题 I'm trying to get the dates from entries in two different RSS feeds through feedparser. Here is what I'm doing: import feedparser as fp reddit = fp.parse("http://www.reddit.com/.rss") cc = fp.parse("http://contentconsumer.com/feed") print reddit.entries[0].date print cc.entries[0].date And here's how they come out: 2008-10-21T22:23:28.033841+00:00 Wed, 15 Oct 2008 10:06:10 +0000 I want to get to the point where I can find out which is newer easily. I've tried using the datetime module of

How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?

给你一囗甜甜゛ 提交于 2019-12-20 09:19:24
问题 I am trying to parse an RSS feed with feedparser and insert it into a mySQL table using SQLAlchemy. I was actually able to get this running just fine but today the feed had an item with an ellipsis character in the description and I get the following error: UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' in position 35: ordinal not in range(256) If I add the convert_unicode=True option to the engine I am able to get the insert to go through but the ellipsis doesn't show

Feedparser.parse() 'SSL: CERTIFICATE_VERIFY_FAILED'

不羁的心 提交于 2019-12-18 05:56:29
问题 I'm having this SSL issue with feedparser parsing an HTTPS RSS feed, I don't really know what to do as I can't find any documentation on this error when it comes to feedparser: >>> import feedparser >>> feed = feedparser.parse(rss) >>> feed {'feed': {}, 'bozo': 1, 'bozo_exception': URLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)'),), 'entries': []} >>> feed["items"] [] >>> 回答1: Thanks you cmidi for the answer, which was to 'monkey patch' using

Feedparser - KeyError: 'fullcount'

混江龙づ霸主 提交于 2019-12-14 02:43:50
问题 I have tried to follow this guide. It is about making a physical gmail notifier. When I entered the same code it found an error: Traceback (most recent call last): File "C:/Python27/Projects/gmailnotifier.py", line 20, in <module> )["feed"]["fullcount"]) File "C:\Python27\lib\site-packages\feedparser-5.1.3-py2.7.egg\feedparser.py", line 375, in __getitem__ return dict.__getitem__(self, key) KeyError: 'fullcount' I am not sure why and thats why im asking. I am using windows 7, python 2.7.3,

bozo_exception in Django / feedparser

风格不统一 提交于 2019-12-13 03:37:55
问题 I'm fairly new to Django and Python. I'm trying to build small RSS reader using feedparser. I'm getting this error and I can't seem to find any solutions anywhere {'feed': {}, 'bozo': 1, 'bozo_exception': TypeError("'Feed' does not have the buffer interface",), 'entries': []} Here are files which are involved (simplified version to ilustrate the problem) ## models class Feed(models.Model): name = models.CharField(max_length=100) url = models.CharField(max_length=100) category = models