Should memory usage increase when using ElementTree.iterparse() when clear()ing trees?

前端 未结 3 1347
生来不讨喜
生来不讨喜 2021-01-12 16:27
import os
import xml.etree.ElementTree as et

for ev, el in et.iterparse(os.sys.stdin):
    el.clear()

Running the above on the ODP structure RDF d

3条回答
  •  没有蜡笔的小新
    2021-01-12 17:09

    As mentioned in the answer by Kevin Guerra, the "root.clear()" strategy in the ElementTree documentation only removes fully parsed children of the root. If those children are anchoring huge branches, it's not very helpful.

    He touched on the ideal solution, but didn't post any code, so here is an example:

    element_stack = []
    context = ET.iterparse(stream, events=('start', 'end'))
    for event, elem in context:
        if event == 'start':
            element_stack.append(elem)
        elif event == 'end':
            element_stack.pop()
            # see if elem is one of interest and do something with it here
            if element_stack:
                element_stack[-1].remove(elem)
    del context
    

    The element of interest will not have subelements; they'll have been removed as soon as their end tags were seen. This might be OK if all you need is the element's text or attributes.

    If you want to query into the element's descendants, you need to let a full branch be built for it. For this, maintain a flag, implemented as a depth counter for those elements. Only call .remove() when the depth is zero:

    element_stack = []
    interesting_element_depth = 0
    context = ET.iterparse(stream, events=('start', 'end'))
    for event, elem in context:
        if event == 'start':
            element_stack.append(elem)
            if elem.tag == 'foo':
                interesting_element_depth += 1
        elif event == 'end':
            element_stack.pop()
            if elem.tag == 'foo':
                interesting_element_depth -= 1
                # do something with elem and its descendants here
            if element_stack and not interesting_element_depth:
                element_stack[-1].remove(elem)
    del context
    

提交回复
热议问题