Python running out of memory parsing XML using cElementTree.iterparse

前端 未结 2 776
慢半拍i
慢半拍i 2020-12-09 16:39

A simplified version of my XML parsing function is here:

import xml.etree.cElementTree as ET

def analyze(xml):
    it = ET.iterparse(file(xml))
    count =          


        
相关标签:
2条回答
  • 2020-12-09 17:09

    Code example:

    import xml.etree.cElementTree as etree
    
    def getelements(filename_or_file, tag):
        context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
        _, root = next(context) # get root element
        for event, elem in context:
            if event == 'end' and elem.tag == tag:
                yield elem
                root.clear() # preserve memory
    
    0 讨论(0)
  • 2020-12-09 17:15

    The documentation does tell you "Parses an XML section into an element tree [my emphasis] incrementally" but doesn't cover how to avoid retaining uninteresting elements (which may be all of them). That is covered by this article by the effbot.

    I strongly recommend that anybody using .iterparse() should read this article by Liza Daly. It covers both lxml and [c]ElementTree.

    Previous coverage on SO:

    Using Python Iterparse For Large XML Files
    Can Python xml ElementTree parse a very large xml file?
    What is the fastest way to parse large XML docs in Python?

    0 讨论(0)
提交回复
热议问题