Python sax to lxml for 80+GB XML

后端 未结 3 1936
后悔当初
后悔当初 2020-12-04 22:43

How would you read an XML file using sax and convert it to a lxml etree.iterparse element?

To provide an overview of the problem, I have built an XML ingestion tool

3条回答
  •  囚心锁ツ
    2020-12-04 23:29

    This is a couple of years old and I don't have enough reputation to comment directly on the accepted answer, but I tried using this to parse an OSM where I am finding all intersections in a country. My original issue was that I was running out of RAM, so I thought I'd have to use the SAX parser but found this answer instead. Strangely it wasn't parsing correctly, and using the suggested cleanup somehow was clearing the elem node before reading through it (still not sure how this was happening). Removed elem.clear() from the code and now it runs perfectly fine!

提交回复
热议问题