Parsing very large HTML file with Python (ElementTree?)
I asked about using BeautifulSoup to parse a very large (270MB) HTML file and getting a memory error andwas pointed toward ElementTree as a solution. I was trying to use their event-driven parsing, documented here . Testing it with the smaller settings file worked fine: >>> settings = open('S:\\Documents\\FacebookData\\html\\settings.htm') >>> for event, element in ET.iterparse(settings, events=("start", "end")): print("%5s, %4s, %s" % (event, element.tag, element.text)) Successfully prints out the elements. However, using that same code with 'messages.htm' instead of 'settings.htm' just to