XML DomDocument optimization

对着背影说爱祢 提交于 2019-12-22 11:04:19

问题


I have a 5MB xml file

I'm using the following code to get all nodeValue

$dom        =   new DomDocument('1.0', 'UTF-8');
if(!$dom->load($url))
return;

$games = $dom->getElementsByTagName("game");
foreach($games as $game)
{

}

This takes 76 seconds and there are around 2000 games tag. Is there any optimization or other solution to get the data?


回答1:


You shouldn't use the Document Object Model on large XML files, it is intended for human readable documents, not big datasets!

If you want fast access you should use XMLReader or SimpleXML.

XMLReader is ideal for parsing whole documents, and SimpleXML has a nice XPath function for retreiving data quickly.

For XMLReader you can use the following code:

<?php

// Parsing a large document with XMLReader with Expand - DOM/DOMXpath 
$reader = new XMLReader();

$reader->open("tooBig.xml");

while ($reader->read()) {
    switch ($reader->nodeType) {
        case (XMLREADER::ELEMENT):
        if ($reader->localName == "game") {
             $node = $reader->expand();
             $dom = new DomDocument();
             $n = $dom->importNode($node,true);
             $dom->appendChild($n);
             $xp = new DomXpath($dom);
             $res = $xp->query("/game/title"); // this is an example
             echo $res->item(0)->nodeValue;
        }
    }
}
?>

The above will output all game titles (assuming you have /game/title XML structure).

For SimpleXML you can use:

$xml = file_get_contents($url);
$sxml = new SimpleXML($xml);
$games = $sxml->xpath('/game'); // returns an array of SXML nodes
foreach ($games as $game)
{
   print $game->nodeValue;
}



回答2:


I once wrote a blog article about loading huge XML files with XMLReader - you probably can use some of it.

Using DOM or SimpleXML is no option, since both load the whole document into memory.




回答3:


You can use DOMXpath for querying, which is way faster than the DOMDocument:: getElementsByTagName() method.

<?php
$xpath = new \DOMXpath($dom);
$games = $xpath->query("//game");

foreach ($games as $game) {
    // Code here
}

In one of my tests with a fairly large file, this approach took < 1 sec to complete the iteration of 24k elements, whereas the DOMDocument:: getElementsByTagName() method was taking ~27 min (and the time took to iterate to the next object was exponential).



来源:https://stackoverflow.com/questions/7299957/xml-domdocument-optimization

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!