Removing an element from a parsed XML tree disrupts iteration

前端 未结 1 1173
滥情空心
滥情空心 2020-12-10 22:38

I want to parse an xml file, then process the result tree by removing selected elements. My problem is that removing an element disrupts the loop that iterates over the ele

相关标签:
1条回答
  • 2020-12-10 23:02

    The issue is you are removing elements from something you are iterating over, when you remove an element the remaining elements get shifted so you can end up removing the incorrect elements:

    A simple solution is to iterate over a copy of the tree or use reversed:

    copy:

     def processGroup(group):
        # creates a shallow copy so we are removing from the original
        # but iterating over a copy. 
        for e in group[:]:
            if e.tag != 'a':
                group.remove(e)
                showGroup(group,'removed <' + e.tag + '>')
    

    reversed:

    def processGroup(group):
        # starts at the end, as the container shrinks.
        # when an element is removed, we still see
        # elements at the same position when we started out loop.
        for e in reversed(group):
            if e.tag != 'a':
                group.remove(e)
                showGroup(group,'removed <' + e.tag + '>')
    

    using the copy logic:

    In [7]: tree = ET.parse('test.xml')
    
    In [8]: root = tree.getroot()
    
    In [9]: for group in root:
       ...:         processGroup(group)
       ...:     
    removed <b>  len=2
    <group>
       <a>
       <c>
    </group>
    
    removed <c>  len=1
    <group>
       <a>
    </group>
    

    You can also use ET.tostring in place of your for loop:

    import xml.etree.ElementTree as ET
    
    def show_group(group,s):
        print(s + '  len=' + str(len(group)))
        print(ET.tostring(group))
    
    
    def process_group(group):
        for e in group[:]:
            if e.tag != 'a':
                group.remove(e)
                show_group(group, 'removed <' + e.tag + '>')
    
    tree = ET.parse('test.xml')
    root = tree.getroot()
    
    for group in root.findall(".//group"):
        process_group(group)
    
    0 讨论(0)
提交回复
热议问题