Stream parse 4 GB XML file in PHP

前端 未结 2 580
孤城傲影
孤城傲影 2020-12-18 05:47

I\'m trying and need some help doing the following:

I want to stream parse a large XML file ( 4 GB ) with PHP. I can\'t use simple XML or DOM because they load the e

2条回答
  •  独厮守ぢ
    2020-12-18 06:33

    For this scenario you can't afford to use a DOM parser, as you stated, it will not fit in memory due to the file size, and even if you could, it'll be slow as it first load the entire file and after that you have to iterate through it, so, for this case you should try a SAX parser (event/stream oriented), add a handler for those tag you're insterested in (doc, title, url, abstract) and for every event append the node found in the new XML file.

    Here you have more information:

    What is the fastest XML parser in PHP?

    Here is a (not tested) sample of what the code would be:

    \n"));
            }
        }
    
        function endElement($parser, $name) {
            global $tags;
    
            if (isset($tags[strtolower($name)])) {
                fwrite($fh, sprintf("\n"));
                $currentNodeTag = "";
            }
        }
    
        function characterData($parser, $data) {
            if (!empty($currentNodeTag)) {
                fwrite($fh, $data);
            }
        }    
    
        $xmlParser = xml_parser_create();
        xml_set_element_handler($xmlParser, "startElement", "endElement");
        xml_set_character_data_handler ($xmlParser, "characterData");
    
        if (!($fp = fopen($file, "r"))) {
            die("could not open XML input");
        }
    
        while ($data = fread($fp, 4096)) {
            if (!xml_parse($xmlParser, $data, feof($fp))) {
                die(sprintf("XML error: %s at line %d",
                            xml_error_string(xml_get_error_code($xmlParser)),
                            xml_get_current_line_number($xmlParser)));
            }
        }
    
        xml_parser_free($xmlParser);
        fclose($fh);
    ?>
    

提交回复
热议问题