Using python lxml.etree for huge XML files

匿名 (未验证) 提交于 2019-12-03 02:30:02

问题:

I would like to parse a huge xml (>200MB) using lxml.etree in Python. I tried to use etree.parse to load the XML file, but this does not work due to the filesize:

etree.parse('file.xml')Traceback (most recent call last): File "<stdin>", line 1, in <module>   File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958)   File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797)   File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080)   File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175)   File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173)   File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)   File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)   File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521) lxml.etree.XMLSyntaxError: Excessive depth in document: 256 use XML_PARSE_HUGE option, line 1276, column 7 

As I want to use xpath expressions, I have to parse the file first. How can I therefore parse the XML file? How do I use XML_PARSE_HUGE in connection to lxml.etree?

Thanks!

回答1:

Try to create a custom XMLParser instance:

from lxml.etree import XMLParser, parse p = XMLParser(huge_tree=True) tree = parse('file.xml', parser=p) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!