Can Python xml ElementTree parse a very large xml file?

前端 未结 5 2018
误落风尘
误落风尘 2020-12-17 19:45

I\'m trying to parse a large file (> 2GB) of structured markup data and the memory is not enough for this.Which is the optimal way of XML parsing class for this condition.Mo

5条回答
  •  臣服心动
    2020-12-17 20:24

    The only API I've seen that can handle this sort of thing at all is pulldom:

    http://docs.python.org/library/xml.dom.pulldom.html

    Pulldom uses the SAX API to build partial DOM nodes; by pulling in specific sub-trees as a group and then discarding them when you're done, you can get the memory efficiency of SAX with the sanity of use of DOM.

    It's an incomplete API; when I used it I had to modify it to make it fully usable, but it works as a foundation. I don't use it anymore, so I don't recall what I had to add; just an advance warning.

    It's very slow.

    XML is a very poor format for handling large data sets. If you have any control over the source data, and if it makes sense for the data set, you're much better off breaking the data apart into smaller chunks that you can parse entirely into memory.

    The other option is using SAX APIs, but they're a serious pain to do anything nontrivial with directly.

提交回复
热议问题