Loading huge XML files and dealing with MemoryError

你说的曾经没有我的故事 提交于 2019-12-04 04:36:43

Do not use BeautifulSoup to try and such a large parse XML file. Use the ElementTree API instead. Specifically, use the iterparse() function to parse your file as a stream, handle information as you are notified of elements, then delete the elements again:

from xml.etree import ElementTree as ET

parser = ET.iterparse(filename)

for event, element in parser:
    # element is a whole element
    if element.tag == 'yourelement'
         # do something with this element
         # then clean up
         element.clear()

By using a event-driven approach, you never need to hold the whole XML document in memory, you only extract what you need and discard the rest.

See the iterparse() tutorial and documentation.

Alternatively, you can also use the lxml library; it offers the same API in a faster and more featurefull package.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!