Parse XML with PHP and XMLReader

前端 未结 2 625
野性不改
野性不改 2020-12-21 02:29

I\'ve been trying to parse a very large XML file with PHP and XMLReader, but can\'t seem to get the results I am looking for. Basically, I\'m searching a ton of information

2条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-21 02:56

    To gain more flexibility with XMLReader I normally create myself iterators that are able to work on the XMLReader object and provide the steps I need.

    That starts with a simple iteration over all nodes over to the iteration over elements optionally with a specific name. Let's call the last one XMLElementIterator taking the reader and the element name as parameters.

    In your scenario I then would create an iterator that returns a SimpleXMLElement for the current element, taking only the elements:

    require('xmlreader-iterators.php'); // https://gist.github.com/hakre/5147685
    
    class HeadendIterator extends XMLElementIterator {
        const ELEMENT_NAME = 'headend';
    
        public function __construct(XMLReader $reader) {
            parent::__construct($reader, self::ELEMENT_NAME);
        }
    
        /**
         * @return SimpleXMLElement
         */
        public function current() {
            return simplexml_load_string($this->reader->readOuterXml());
        }
    }
    

    Equipped with this iterator the rest of your job is mainly a piece of cake. First load the 10 gigabyte file:

    $pc      = "78746";
    
    $xmlfile = '../data/lineups.xml';
    $reader  = new XMLReader();
    $reader->open($xmlfile);
    

    And then check if the element contains the information and if so, display the data / XML:

    foreach (new HeadendIterator($reader) as $headend) {
        /* @var $headend SimpleXMLElement */
        if (!$headend->xpath("/*/postalCodes/postalCode[. = '$pc']")) {
            continue;
        }
    
        echo 'Found, name: ', $headend->name, "\n";
        echo "==========================================\n";
        $headend->asXML('php://stdout');
    }
    

    This does literally what you're trying to achieve: Iterate over the large document (which is memory-friendly) until you find the element(s) you're interested in. You then process on the concrete element and it's XML only; XMLReader::readOuterXml() is a fine tool here.

    Exemplary output:

    Found, name: Grande Gables at The Terrace
    ==========================================
    
    
            Grande Gables at The Terrace
            Grande Communications
            
                635
            
            
                11111
                22222
                33333
                78746
            
            Austin
            
                
                    002
                
                
                    003
                
            
            
                
                    Thorndale
                    Milam
                    TX
                
                
                    Thrall
                    Williamson
                    TX
                
            
        
    

提交回复
热议问题