ElementTree : Element.remove() jumping iteration

不问归期 提交于 2019-11-28 05:33:10

问题


I have this xml inputfile:

<?xml version="1.0"?>
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    <second>
      <third-num>2</third-num>
      <third-def>object002</third-def>
      <third-len>426</third-len>
    </second>
    <second>
      <third-num>3</third-num>
      <third-def>object003</third-def>
      <third-len>998</third-len>
    </second>
  </First>
</zero>

My goal is to remove any second level for which <third-def> that is not a value. To do that, I wrote this code:

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET
inputfile='inputfile.xml'
tree = ET.parse(inputfile)
root = tree.getroot()

elem = tree.find('First')
for elem2 in tree.iter(tag='second'):
    if elem2.find('third-def').text == 'object001':
        pass
    else:
        elem.remove(elem2)
        #elem2.clear()

My problem is elem.remove(elem2). It skips every other second level. Here is the output of this code:

<?xml version="1.0" ?>
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    <second>
      <third-num>3</third-num>
      <third-def>object003</third-def>
      <third-len>998</third-len>
    </second>
  </First>
</zero>

Now if I un-comment the elem2.clear() line, the script works perfectly, but the output is less nice as it keeps all the removed second levels:

<?xml version="1.0" ?>
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    <second/>
    <second/>
  </First>
</zero>

Does anybody has a clue why my element.remove() statement is wrong?


回答1:


You are looping over the live tree:

for elem2 in tree.iter(tag='second'):

which you then change while iterating. The 'counter' of the iteration won't be told about the changed number of elements, so when looking at element 0 and removing that element, the iterator then moves on to element number 1. But what was element number 1 is now element number 0.

Capture a list of all the elements first, then loop over that:

for elem2 in tree.findall('.//second'):

.findall() returns a list of results, which doesn't update as you alter the tree.

Now the iteration won't skip the last element:

>>> print ET.tostring(tree)
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    </First>
</zero>

This phenomenon is not limited to ElementTree trees; see Loop "Forgets" to Remove Some Items



来源:https://stackoverflow.com/questions/22817530/elementtree-element-remove-jumping-iteration

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!