Why is lxml.etree.iterparse() eating up all my memory?

后端未结

关注

 3  1008

孤街浪徒 2020-12-01 06:50

This eventually consumes all my available memory and then the process is killed. I\'ve tried changing the tag from schedule to \'smaller\' tags but that didn\'

3条回答

时光取名叫无心 (楼主)

2020-12-01 07:13
Directly copied from http://effbot.org/zone/element-iterparse.htm

Note that iterparse still builds a tree, just like parse, but you can safely rearrange or remove parts of the tree while parsing. For example, to parse large files, you can get rid of elements as soon as you’ve processed them:
```
for event, elem in iterparse(source):
    if elem.tag == "record":
        ... process record elements ...
        elem.clear()
```
The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element. The easiest way to do this is to enable start events, and save a reference to the first element in a variable:
```
# get an iterable
context = iterparse(source, events=("start", "end"))

# turn it into an iterator
context = iter(context)

# get the root element
event, root = context.next()

for event, elem in context:
    if event == "end" and elem.tag == "record":
        ... process record elements ...
        root.clear()
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...