Problem - XML declaration allowed only at the start of the document

前端 未结 3 1666
野的像风
野的像风 2020-12-11 03:04

xml:19558: parser error : XML declaration allowed only at the start of the document

any solutions? i am using php XMLReader to parse a large XML file, but getting th

相关标签:
3条回答
  • 2020-12-11 03:09

    Another possible cause to this problem is unicode file head. If your XML's encoding is UTF-8, the file content will always start with these 3 bytes "EF BB BF". These bytes may be interpreted incorrectly if one attempts to convert from byte array to string. The solution is to write byte array to file directly without reading getString from the byte array.

    ASCII has no file head Unicode: FF FE UTF-8: EF BB BF UTF-32: FF FE 00 00

    Just open the file in ultraedit and you can see these bytes.

    0 讨论(0)
  • 2020-12-11 03:13

    If you have multiple XML declarations, you likely have a concatenation of many XML files, and also more than one root element. It's not clear how you would meaningfully parse them.

    Try really hard to get the source of the XML to give you real XML first. If that doesn't work, see if you can do some preprocessing to fix the XML before you parse it.

    0 讨论(0)
  • 2020-12-11 03:30

    Make sure there isn't any white space before the first tag. Try this:

        <?php
    //Declarations
    $file = "data.txt"; //The file to read from.
    
    #Read the file
    $fp = fopen($file, "r"); //Open the file
    $data = ""; //Initialize variable to contain the file's content
    while(!feof($fp)) //Loop through the file, read it till the end.
    {
        $data .= fgets($fp, 1024); //append next kb to data
    } 
    fclose($fp); //Close file
    #End read file
    $split = preg_split('/(?<=<\/xml>)(?!$)/', $data); //Split each xml occurence into its own string
    
    foreach ($split as $sxml) //Loop through each xml string
    {
        //echo $sxml;
        $reader = new XMLReader(); //Initialize the reader
        $reader->xml($sxml) or die("File not found"); //open the current xml string
        while($reader->read()) //Read it
        {
            switch($reader->nodeType)
            {
                case constant('XMLREADER::ELEMENT'): //Read element
                    if ($reader->name == 'record')
                    {
                        $dataa = $reader->readInnerXml(); //get contents for <record> tag.
                        echo $dataa; //Print it to screen.
                    }
                break;
            }
        }
        $reader->close(); //close reader
    }
    ?>
    

    Set the $file variable to the file you want. Note I don't know how well this will work for a 4gb file. Tell me if it doesn't.

    EDIT: Here is another solution, it should work better with the larger file (parses as it is reading the file).

    <?php
    set_time_limit(0);
    //Declarations
    $file = "data.txt"; //The file to read from.
    
    #Read the file
    $fp = fopen($file, "r") or die("Couldn't Open"); //Open the file
    
    $FoundXmlTagStep = 0;
    $FoundEndXMLTagStep = 0;
    $curXML = "";
    $firstXMLTagRead = false;
    while(!feof($fp)) //Loop through the file, read it till the end.
    {
        $data = fgets($fp, 2);
        if ($FoundXmlTagStep==0 && $data == "<")
            $FoundXmlTagStep=1;
        else if ($FoundXmlTagStep==1 && $data == "x")
            $FoundXmlTagStep=2;
        else if ($FoundXmlTagStep==2 && $data == "m")
            $FoundXmlTagStep=3;
        else if ($FoundXmlTagStep==3 && $data == "l")
        {
            $FoundXmlTagStep=4;
            $firstXMLTagRead = true;
        }
        else if ($FoundXmlTagStep!=4)
            $FoundXmlTagStep=0;
    
        if ($FoundXmlTagStep==4)
        {
            if ($firstXMLTagRead)
            {
                $firstXMLTagRead = false;
                $curXML = "<xm";
            }
            $curXML .= $data;
    
            //Start trying to match end of xml
            if ($FoundEndXMLTagStep==0 && $data == "<")
                $FoundEndXMLTagStep=1;
            elseif ($FoundEndXMLTagStep==1 && $data == "/")
                $FoundEndXMLTagStep=2;
            elseif ($FoundEndXMLTagStep==2 && $data == "x")
                $FoundEndXMLTagStep=3;
            elseif ($FoundEndXMLTagStep==3 && $data == "m")
                $FoundEndXMLTagStep=4;
            elseif ($FoundEndXMLTagStep==4 && $data == "l")
                $FoundEndXMLTagStep=5;
            elseif ($FoundEndXMLTagStep==5 && $data == ">")
            {
                $FoundEndXMLTagStep=0;
                $FoundXmlTagStep=0;
                #finished Reading XML
                ParseXML ($curXML);
            }
            elseif ($FoundEndXMLTagStep!=5)
                $FoundEndXMLTagStep=0;
        }
    } 
    fclose($fp); //Close file
    function ParseXML ($xml)
    {
        //echo $sxml;
        $reader = new XMLReader(); //Initialize the reader
        $reader->xml($xml) or die("File not found"); //open the current xml string
        while($reader->read()) //Read it
        {
            switch($reader->nodeType)
            {
                case constant('XMLREADER::ELEMENT'): //Read element
                    if ($reader->name == 'record')
                    {
                        $dataa = $reader->readInnerXml(); //get contents for <record> tag.
                        echo $dataa; //Print it to screen.
                    }
                break;
            }
        }
        $reader->close(); //close reader
    }
    ?>
    
    0 讨论(0)
提交回复
热议问题