lxml etree.parse MemoryAllocation Error

不问归期 提交于 2021-02-07 19:52:25

问题


I'm using lxml etree.parse to parse a, somehow, huge XML file (around 65MB - 300MB). When I run my stand alone python script containing the below function, I am getting a Memory Allocation failure:

Error:

     Memory allocation failed : xmlSAX2Characters, line 5350155, column 16

Partial function code:

def getID():
        try:
            from lxml import etree
            xml = etree.parse(<xml_file>)  # here is where the failure occurs
            for element in xml.iter():
                   ...
                   result = <formed by concatenating element texts>
            return result
        except Exception, ex:
            <handle exception>

The weird thing is when I input the same function on IDLE, and tested the same XML file, I am not encountering any MemoryAllocation error.

Any ideas on this issue? Thanks in advance.


回答1:


I would parse the document using the iterative parser instead, calling .clear() on any element you are done with; that way you avoid having to load the whole document in memory in one go.

You can limit the iterative parser to only those tags you are interested in. If you only want to parse <person> tags, tell your parser so:

for _, element in etree.iterparse(input, tag='person'):
    # process your person data
    element.clear()

By clearing the element in the loop, you free it from memory.



来源:https://stackoverflow.com/questions/10855921/lxml-etree-parse-memoryallocation-error

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!