How to parse an xml feed using python?

前端 未结 2 942
[愿得一人]
[愿得一人] 2021-01-01 03:11

I am trying to parse this xml (http://www.reddit.com/r/videos/top/.rss) and am having troubles doing so. I am trying to save the youtube links in each of the items, but am

2条回答
  •  难免孤独
    2021-01-01 03:37

    I wrote that for you using Xpath expressions (tested successfully ):

    from lxml import etree
    import urllib2
    
    headers = { 'User-Agent' : 'Mozilla/5.0' }
    req = urllib2.Request('http://www.reddit.com/r/videos/top/.rss', None, headers)
    reddit_file = urllib2.urlopen(req).read()
    
    reddit = etree.fromstring(reddit_file)
    
    for item in reddit.xpath('/rss/channel/item'):
        print "title =", item.xpath("./title/text()")[0]
        print "description =", item.xpath("./description/text()")[0]
        print "thumbnail =", item.xpath("./*[local-name()='thumbnail']/@url")[0]
        print "link =", item.xpath("./link/text()")[0]
        print "-" * 100
    

提交回复
热议问题