Can Python xml ElementTree parse a very large xml file?

前端未结

关注

 5  2018

误落风尘 2020-12-17 19:45

I\'m trying to parse a large file (> 2GB) of structured markup data and the memory is not enough for this.Which is the optimal way of XML parsing class for this condition.Mo

5条回答

臣服心动 (楼主)

2020-12-17 20:24

The only API I've seen that can handle this sort of thing at all is pulldom:

http://docs.python.org/library/xml.dom.pulldom.html

Pulldom uses the SAX API to build partial DOM nodes; by pulling in specific sub-trees as a group and then discarding them when you're done, you can get the memory efficiency of SAX with the sanity of use of DOM.

It's an incomplete API; when I used it I had to modify it to make it fully usable, but it works as a foundation. I don't use it anymore, so I don't recall what I had to add; just an advance warning.

It's very slow.

XML is a very poor format for handling large data sets. If you have any control over the source data, and if it makes sense for the data set, you're much better off breaking the data apart into smaller chunks that you can parse entirely into memory.

The other option is using SAX APIs, but they're a serious pain to do anything nontrivial with directly.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...