Why is lxml.etree.iterparse() eating up all my memory?

后端 未结 3 1008
孤街浪徒
孤街浪徒 2020-12-01 06:50

This eventually consumes all my available memory and then the process is killed. I\'ve tried changing the tag from schedule to \'smaller\' tags but that didn\'

3条回答
  •  时光取名叫无心
    2020-12-01 07:13

    Directly copied from http://effbot.org/zone/element-iterparse.htm

    Note that iterparse still builds a tree, just like parse, but you can safely rearrange or remove parts of the tree while parsing. For example, to parse large files, you can get rid of elements as soon as you’ve processed them:

    for event, elem in iterparse(source):
        if elem.tag == "record":
            ... process record elements ...
            elem.clear()
    

    The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element. The easiest way to do this is to enable start events, and save a reference to the first element in a variable:

    # get an iterable
    context = iterparse(source, events=("start", "end"))
    
    # turn it into an iterator
    context = iter(context)
    
    # get the root element
    event, root = context.next()
    
    for event, elem in context:
        if event == "end" and elem.tag == "record":
            ... process record elements ...
            root.clear()
    

提交回复
热议问题