Parsing RSS with Elementtree in Python

ⅰ亾dé卋堺 提交于 2020-01-13 09:04:57

问题


How do you search for namespace-specific tags in XML using Elementtree in Python?

I have an XML/RSS document like:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:wp="http://wordpress.org/export/1.0/"
>
<channel>
    <title>sometitle</title>
    <pubDate>Tue, 28 Aug 2012 22:36:02 +0000</pubDate>
    <generator>http://wordpress.org/?v=2.5.1</generator>
    <language>en</language>
    <wp:wxr_version>1.0</wp:wxr_version>
    <wp:category><wp:category_nicename>apache</wp:category_nicename><wp:category_parent></wp:category_parent><wp:cat_name><![CDATA[Apache]]></wp:cat_name></wp:category>
</channel>
</rss>

But when I try and find all "wp:category" tags by doing:

import xml.etree.ElementTree as xml
tree = xml.parse(fn)
doc = tree.getroot()
categories = doc.findall('channel/wp:category')

I get the error:

SyntaxError: prefix 'wp' not found in prefix map

Searching for any non-namespace specific fields works just fine. What am I doing wrong?


回答1:


You need to handle the namespace prefixes, either by using iterparse and handling the event directly or by explicitly declaring the prefixes you're interested in before parsing. Depending on what you're trying to do, I will admit in my lazier moments I just strip all the prefixes out with a string replace before parsing the XML.

EDIT: this similar question might help.



来源:https://stackoverflow.com/questions/12861752/parsing-rss-with-elementtree-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!