I am trying to parse the stack overflow data dump, one of the tables is called posts.xml which has around 10 million entry in it. Sample xml:
It is pretty much the same approach as I've answered here already.
Scroll down to the org.xml.sax Implementation part. You'll only need a custom handler.
org.xml.sax Implementation