Python running out of memory parsing XML using cElementTree.iterparse

偶尔善良 提交于 2019-11-28 09:04:12
John Machin

The documentation does tell you "Parses an XML section into an element tree [my emphasis] incrementally" but doesn't cover how to avoid retaining uninteresting elements (which may be all of them). That is covered by this article by the effbot.

I strongly recommend that anybody using .iterparse() should read this article by Liza Daly. It covers both lxml and [c]ElementTree.

Previous coverage on SO:

Using Python Iterparse For Large XML Files
Can Python xml ElementTree parse a very large xml file?
What is the fastest way to parse large XML docs in Python?

Code example:

import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!