I have a 15 GB XML file which I would want to split it .It has approximately 300 Million lines in it . It doesn\'t have any top nodes which are interdependent .Is there any
Here is a low memory footprint script to do it in the free firstobject XML editor (foxe) using CMarkup file mode. I am not sure what you mean by no interdependent top nodes, or tag checking, but assuming under the root element you have millions of top level elements containing object properties or rows that each need to be kept together as a unit, and you wanted say 1 million per output file, you could do this:
split_xml_15GB()
{
int nObjectCount = 0, nFileCount = 0;
CMarkup xmlInput, xmlOutput;
xmlInput.Open( "15GB.xml", MDF_READFILE );
xmlInput.FindElem(); // root
str sRootTag = xmlInput.GetTagName();
xmlInput.IntoElem();
while ( xmlInput.FindElem() )
{
if ( nObjectCount == 0 )
{
++nFileCount;
xmlOutput.Open( "piece" + nFileCount + ".xml", MDF_WRITEFILE );
xmlOutput.AddElem( sRootTag );
xmlOutput.IntoElem();
}
xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
++nObjectCount;
if ( nObjectCount == 1000000 )
{
xmlOutput.Close();
nObjectCount = 0;
}
}
if ( nObjectCount )
xmlOutput.Close();
xmlInput.Close();
return nFileCount;
}
I posted a youtube video and article about this here:
http://www.firstobject.com/xml-splitter-script-video.htm