Elementtree displaying elements out of order

断了今生、忘了曾经 提交于 2019-12-12 01:49:37

问题


I'm using Python's ElementTree to parse xml files. I have a "findall" to find all "revision" subelements, but when I iterate through the result, they are not in document order. What can I be doing wrong?

Here's my code:

allrevisions = page.findall('{http://www.mediawiki.org/xml/export-0.5/}revision')
for rev in allrevisions:
    print rev
    print rev.find('{http://www.mediawiki.org/xml/export-0.5/}timestamp').text

Here's a link to the document I'm parsing: http://pastie.org/2780983

Thanks, bsg

-Oops. By going through my code and running it piece by piece, I worked out the problem - I had stuck in a reverse() on the elements list in the wrong place, which was causing all the trouble. Thank you so much for your help - I'm sorry it was such a silly issue.


回答1:


The documentation for ElementTree says that findall returns the elements in document order.

A quick test shows the correct behaviour:

import xml.etree.ElementTree as et

xmltext = """
<root>
    <number>1</number>
    <number>2</number>
    <number>3</number>
    <number>4</number>
</root>
"""

tree = et.fromstring(xmltext)

for number in tree.findall('number'):
    print number.text

Result:

1
2
3
4

It would be helpful to see the document you are parsing.


Update:

Using the source data you provided:

from __future__ import with_statement
import xml.etree.ElementTree as et

with open('xmldata.xml', 'r') as f:
    xmldata = f.read()

tree = et.fromstring(xmldata)

for revision in tree.findall('.//{http://www.mediawiki.org/xml/export-0.5/}revision'):
    print revision.find('{http://www.mediawiki.org/xml/export-0.5/}text').text[0:10].encode('utf8')

Result:

‘The Mind 
{{db-spam}
‘The Mind 
'''The Min
<!-- Pleas

The same order as they appear in the document.



来源:https://stackoverflow.com/questions/7942875/elementtree-displaying-elements-out-of-order

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!