I am trying to parse the stackoverflow dump file (Posts.xml- 17gb) .It is of the form:
.
Using PHP xmlreader seems to be the right thing to do.
Reason: Because of your statement:
I have to 'group' each question with their answers. Basically find a question (posttypeid=1) find its answers using parentId of another row and store it in db.
What I understand is you like to build a database with questions an answers. Therefore, there is no reason to do the "grouping" on the XML level. Put all relevant information in the database and do the grouping on the DB level - with db commands (sql ...).
What you have to is use something like "Using the target parser method" E.g [High-performance XML parsing in Python with xml (Even if it is for Python, it's a good start). This should be possible with XMLReader.