XML Split of a Large file

前端 未结 10 1530
心在旅途
心在旅途 2021-01-04 00:41

I have a 15 GB XML file which I would want to split it .It has approximately 300 Million lines in it . It doesn\'t have any top nodes which are interdependent .Is there any

10条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-04 00:54

    Here is a low memory footprint script to do it in the free firstobject XML editor (foxe) using CMarkup file mode. I am not sure what you mean by no interdependent top nodes, or tag checking, but assuming under the root element you have millions of top level elements containing object properties or rows that each need to be kept together as a unit, and you wanted say 1 million per output file, you could do this:

    split_xml_15GB()
    {
      int nObjectCount = 0, nFileCount = 0;
      CMarkup xmlInput, xmlOutput;
      xmlInput.Open( "15GB.xml", MDF_READFILE );
      xmlInput.FindElem(); // root
      str sRootTag = xmlInput.GetTagName();
      xmlInput.IntoElem();
      while ( xmlInput.FindElem() )
      {
        if ( nObjectCount == 0 )
        {
          ++nFileCount;
          xmlOutput.Open( "piece" + nFileCount + ".xml", MDF_WRITEFILE );
          xmlOutput.AddElem( sRootTag );
          xmlOutput.IntoElem();
        }
        xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
        ++nObjectCount;
        if ( nObjectCount == 1000000 )
        {
          xmlOutput.Close();
          nObjectCount = 0;
        }
      }
      if ( nObjectCount )
        xmlOutput.Close();
      xmlInput.Close();
      return nFileCount;
    }

    I posted a youtube video and article about this here:

    http://www.firstobject.com/xml-splitter-script-video.htm

提交回复
热议问题