A simplified version of my XML parsing function is here:
import xml.etree.cElementTree as ET
def analyze(xml):
it = ET.iterparse(file(xml))
count =
Code example:
import xml.etree.cElementTree as etree
def getelements(filename_or_file, tag):
context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
_, root = next(context) # get root element
for event, elem in context:
if event == 'end' and elem.tag == tag:
yield elem
root.clear() # preserve memory
The documentation does tell you "Parses an XML section into an element tree [my emphasis] incrementally" but doesn't cover how to avoid retaining uninteresting elements (which may be all of them). That is covered by this article by the effbot.
I strongly recommend that anybody using .iterparse()
should read this article by Liza Daly. It covers both lxml
and [c]ElementTree.
Previous coverage on SO:
Using Python Iterparse For Large XML Files
Can Python xml ElementTree parse a very large xml file?
What is the fastest way to parse large XML docs in Python?