Can Python xml ElementTree parse a very large xml file?

前端未结

关注

 5  2015

误落风尘 2020-12-17 19:45

I\'m trying to parse a large file (> 2GB) of structured markup data and the memory is not enough for this.Which is the optimal way of XML parsing class for this condition.Mo

5条回答

情话喂你 (楼主)

2020-12-17 20:27

Yes, ten years later, there are already many new solutions for handling large files. Below I recommend one for everyone.

For example, the content of the file test.xml is as follows



    
        Strawberry Belgian Waffles
        $7.95
        
        Light Belgian waffles covered with strawberries and whipped cream
        
        900
    
    
        Berry-Berry Belgian Waffles
        $8.95
        
        Belgian waffles covered with assorted fresh berries and whipped cream
        
        900
    
    ......

The solution using SimplifiedDoc is as follows:

from simplified_scrapy import SimplifiedDoc, utils

doc = SimplifiedDoc()
doc.loadFile('test.xml', lineByline=True)

for food in doc.getIterable('food'):
    print (food.children.text)

Result:

['Strawberry Belgian Waffles', '$7.95', 'Light Belgian waffles covered with strawberries and whipped cream', '900']
...

0 讨论(0)

查看其它5个回答