问题
I'm using lxml etree.parse to parse a, somehow, huge XML file (around 65MB - 300MB). When I run my stand alone python script containing the below function, I am getting a Memory Allocation failure:
Error:
Memory allocation failed : xmlSAX2Characters, line 5350155, column 16
Partial function code:
def getID():
try:
from lxml import etree
xml = etree.parse(<xml_file>) # here is where the failure occurs
for element in xml.iter():
...
result = <formed by concatenating element texts>
return result
except Exception, ex:
<handle exception>
The weird thing is when I input the same function on IDLE, and tested the same XML file, I am not encountering any MemoryAllocation error.
Any ideas on this issue? Thanks in advance.
回答1:
I would parse the document using the iterative parser instead, calling .clear()
on any element you are done with; that way you avoid having to load the whole document in memory in one go.
You can limit the iterative parser to only those tags you are interested in. If you only want to parse <person>
tags, tell your parser so:
for _, element in etree.iterparse(input, tag='person'):
# process your person data
element.clear()
By clearing the element in the loop, you free it from memory.
来源:https://stackoverflow.com/questions/10855921/lxml-etree-parse-memoryallocation-error