I want to parse an xml file, then process the result tree by removing selected elements. My problem is that removing an element disrupts the loop that iterates over the ele
The issue is you are removing elements from something you are iterating over, when you remove an element the remaining elements get shifted so you can end up removing the incorrect elements:
A simple solution is to iterate over a copy of the tree or use reversed:
copy:
def processGroup(group):
# creates a shallow copy so we are removing from the original
# but iterating over a copy.
for e in group[:]:
if e.tag != 'a':
group.remove(e)
showGroup(group,'removed <' + e.tag + '>')
reversed:
def processGroup(group):
# starts at the end, as the container shrinks.
# when an element is removed, we still see
# elements at the same position when we started out loop.
for e in reversed(group):
if e.tag != 'a':
group.remove(e)
showGroup(group,'removed <' + e.tag + '>')
using the copy logic:
In [7]: tree = ET.parse('test.xml')
In [8]: root = tree.getroot()
In [9]: for group in root:
...: processGroup(group)
...:
removed <b> len=2
<group>
<a>
<c>
</group>
removed <c> len=1
<group>
<a>
</group>
You can also use ET.tostring
in place of your for loop:
import xml.etree.ElementTree as ET
def show_group(group,s):
print(s + ' len=' + str(len(group)))
print(ET.tostring(group))
def process_group(group):
for e in group[:]:
if e.tag != 'a':
group.remove(e)
show_group(group, 'removed <' + e.tag + '>')
tree = ET.parse('test.xml')
root = tree.getroot()
for group in root.findall(".//group"):
process_group(group)