Removing an element from a parsed XML tree disrupts iteration

前端未结

关注

 1  1173

I want to parse an xml file, then process the result tree by removing selected elements. My problem is that removing an element disrupts the loop that iterates over the ele

相关标签:

1条回答

庸人自扰

2020-12-10 23:02

The issue is you are removing elements from something you are iterating over, when you remove an element the remaining elements get shifted so you can end up removing the incorrect elements:

A simple solution is to iterate over a copy of the tree or use reversed:

copy:

 def processGroup(group):
    # creates a shallow copy so we are removing from the original
    # but iterating over a copy. 
    for e in group[:]:
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

reversed:

def processGroup(group):
    # starts at the end, as the container shrinks.
    # when an element is removed, we still see
    # elements at the same position when we started out loop.
    for e in reversed(group):
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

using the copy logic:

In [7]: tree = ET.parse('test.xml')

In [8]: root = tree.getroot()

In [9]: for group in root:
   ...:         processGroup(group)
   ...:     
removed <b>  len=2
<group>
   <a>
   <c>
</group>

removed <c>  len=1
<group>
   <a>
</group>

You can also use ET.tostring in place of your for loop:

import xml.etree.ElementTree as ET

def show_group(group,s):
    print(s + '  len=' + str(len(group)))
    print(ET.tostring(group))


def process_group(group):
    for e in group[:]:
        if e.tag != 'a':
            group.remove(e)
            show_group(group, 'removed <' + e.tag + '>')

tree = ET.parse('test.xml')
root = tree.getroot()

for group in root.findall(".//group"):
    process_group(group)

0 讨论(0)