I\'m trying to parse a large file (> 2GB) of structured markup data and the memory is not enough for this.Which is the optimal way of XML parsing class for this condition.Mo
Yes, ten years later, there are already many new solutions for handling large files. Below I recommend one for everyone.
For example, the content of the file test.xml is as follows
Strawberry Belgian Waffles
$7.95
Light Belgian waffles covered with strawberries and whipped cream
900
Berry-Berry Belgian Waffles
$8.95
Belgian waffles covered with assorted fresh berries and whipped cream
900
......
The solution using SimplifiedDoc is as follows:
from simplified_scrapy import SimplifiedDoc, utils
doc = SimplifiedDoc()
doc.loadFile('test.xml', lineByline=True)
for food in doc.getIterable('food'):
print (food.children.text)
Result:
['Strawberry Belgian Waffles', '$7.95', 'Light Belgian waffles covered with strawberries and whipped cream', '900']
...