Parsing extremely large XML files in php

前端 未结 2 954
花落未央
花落未央 2020-12-03 15:33

I need to parse XML files of 40GB in size, and then normalize, and insert to a MySQL database. How much of the file I need to store in the database is not clear, neither do

2条回答
  •  独厮守ぢ
    2020-12-03 16:20

    In PHP, you can read in extreme large XML files with the XMLReaderDocs:

    $reader = new XMLReader();
    $reader->open($xmlfile);
    

    Extreme large XML files should be stored in a compressed format on disk. At least this makes sense as XML files have a high compression ratio. For example gzipped like large.xml.gz.

    PHP supports that quite well with XMLReader via the compression wrappersDocs:

    $xmlfile = 'compress.zlib://path/to/large.xml.gz';
    
    $reader = new XMLReader();
    $reader->open($xmlfile);
    

    The XMLReader allows you to operate on the current element "only". That means it's forward-only. If you need to keep parser state, you need to build it your own.

    I often find it helpful to wrap the basic movements into a set of iterators that know how to operate on XMLReader like iterating through elements or child-elements only. You find this outlined in Parse XML with PHP and XMLReader.

    See as well:

    • PHP open gzipped XML

提交回复
热议问题