I am trying to parse the stackoverflow dump file (Posts.xml- 17gb) .It is of the form:
.
Because the way you are processing this large file isn't sequential but requires direct access, I think the only viable option is to load the data into an XML database.