parsing/scanning through a 17gb xml file

前端 未结 3 1358
耶瑟儿~
耶瑟儿~ 2020-12-19 16:06

I am trying to parse the stackoverflow dump file (Posts.xml- 17gb) .It is of the form:



.


        
3条回答
  •  难免孤独
    2020-12-19 17:07

    Using PHP xmlreader seems to be the right thing to do.

    Reason: Because of your statement:

    I have to 'group' each question with their answers. Basically find a question (posttypeid=1) find its answers using parentId of another row and store it in db.

    What I understand is you like to build a database with questions an answers. Therefore, there is no reason to do the "grouping" on the XML level. Put all relevant information in the database and do the grouping on the DB level - with db commands (sql ...).

    What you have to is use something like "Using the target parser method" E.g [High-performance XML parsing in Python with xml (Even if it is for Python, it's a good start). This should be possible with XMLReader.

提交回复
热议问题