Using ElementTree to parse an XML string with a namespace

我与影子孤独终老i 提交于 2019-12-06 03:10:01

The following should get you going:

>>> tree.findall('*/*')
[<Element '{http://www.example.com/dir/}UniqueID' at 0x10899e450>]

This lists all the elements that are two levels below the root of your tree (the UniqueID element, in your case). You can, alternatively, find only the first element at this level, with tree.find(). You can then directly get the text contents of the UniqueID element:

>>> unique_id_elmt = tree.find('*/*')  # First (and only) element two levels below the root
>>> unique_id_elmt
<Element '{http://www.example.com/dir/}UniqueID' at 0x105ec9450>
>>> unique_id_elmt.text  # Text contained in UniqueID
'abcdefghijklmnopqrstuvwxyz0123456789'

Alternatively, you can directly find some precise element by specifying its full path:

>>> tree.find('{{{0}}}Item/{{{0}}}UniqueID'.format(NS))  # Tags are prefixed with NS
<Element '{http://www.example.com/dir/}UniqueID' at 0x10899ead0>

As Tomalak indicated, Fredrik Lundh's site might contain useful information; you want to check how prefixes can be handled: there might in fact be a simpler way to handle them than by making explicit the NS path in the method above.

eyquem

I know there will be some yells of horror and downvotes for my answer as retaliation, because I use module re to analyse an XML string, but note that:

  • in the majority of cases , the following solution won't cause any problem

  • I wish the downvoters will give examples of cases causing problems with my solution

  • I don't parse the string, taking the word 'parse' in the sense of 'obtaining a tree before analysing the tree to find what is searched'; I analyse it: I find directly the whished portion of text where it is

I don't pretend that an XML string must always been analysed with the help of re. It is probably in the majority of cases that an XML string must be parsed with a dedicated parser. I only say that in simple cases like this one , in which a simple and fast analyze is possible, it is easier to use the regex tool, which is, by the way, faster.

import re

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

print xml_string
print '\n=================================\n'

print re.search('<UniqueID>(.+?)</UniqueID>', xml_string, re.DOTALL).group(1)

result

<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>

=================================

abcdefghijklmnopqrstuvwxyz0123456789
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!