Can Python xml ElementTree parse a very large xml file?

前端 未结 5 2015
误落风尘
误落风尘 2020-12-17 19:45

I\'m trying to parse a large file (> 2GB) of structured markup data and the memory is not enough for this.Which is the optimal way of XML parsing class for this condition.Mo

5条回答
  •  情话喂你
    2020-12-17 20:27

    Yes, ten years later, there are already many new solutions for handling large files. Below I recommend one for everyone.

    For example, the content of the file test.xml is as follows

    
    
        
            Strawberry Belgian Waffles
            $7.95
            
            Light Belgian waffles covered with strawberries and whipped cream
            
            900
        
        
            Berry-Berry Belgian Waffles
            $8.95
            
            Belgian waffles covered with assorted fresh berries and whipped cream
            
            900
        
        ......
    
    

    The solution using SimplifiedDoc is as follows:

    from simplified_scrapy import SimplifiedDoc, utils
    
    doc = SimplifiedDoc()
    doc.loadFile('test.xml', lineByline=True)
    
    for food in doc.getIterable('food'):
        print (food.children.text)
    

    Result:

    ['Strawberry Belgian Waffles', '$7.95', 'Light Belgian waffles covered with strawberries and whipped cream', '900']
    ...
    

提交回复
热议问题