What is the fastest XML Parser available for Delphi?

前端 未结 5 1632
梦谈多话
梦谈多话 2020-12-04 14:46

We have reasonably large XML strings which we currently parse using MSXML2

I have just tried using MSXML6 hoping for a speed improvement and have got nothing!

<
相关标签:
5条回答
  • 2020-12-04 15:14

    Recently I had a similar issue where using the MSXML DOM parser proved to be too slow for the given task. I had to parse rather large documents > 1MB and the memory consumption of the DOM parser was prohibitive. My solution was to not use a DOM parser at all, but to go with the event driven MSXML SAX parser. This proved to be much, much faster. Unfortunately the programming model is totally different, but dependent on the task, it might be worth it. Craig Murphy has published an excellent article on how to use the MSXML SAX parser in delphi: SAX, Delphi and Ex Em El

    0 讨论(0)
  • 2020-12-04 15:21

    I know that it's an old question, but people might find it interesting:

    I wrote a new XML library for Delphi (OXml): http://www.kluug.net/oxml.php

    It features direct XML handling (read+write), SAX parser, DOM and a sequential DOM parser. One of the benefits is that OXml supports Delphi 6-Delphi XE5, FPC/Lazarus and C++Builder on all platforms (Win, MacOSX, Linux, iOS, Android).

    OXml DOM is record/pointer based and offers better performance than any other XML library:

    The read test returns the time the parser needs to read a custom XML DOM from a file (column "load") and to write node values to a constant dummy function (column "navigate"). The file is encoded in UTF-8 and it's size is about 5,6 MB.

    XML parse comparison

    The write test returns the time the parser needs to create a DOM (column "create") and write this DOM to a file (column "save"). The file is encoded in UTF-8 and it's size is about 11 MB.

    XML write comparison

    + The poor OmniXML (original) writing performance was the result of the fact that OmniXML didn't use buffering for writing. Thus writing to a TFileStream was very slow. I updated OmniXML and added buffering support. You can get the latest OmniXML code from the SVN.

    0 讨论(0)
  • 2020-12-04 15:21

    Someday I have written very simple XML test suite. It serves MSXML (D7 MSXML3?), Omni XML (bit old) and Jedi XML (latest stable).

    Test results for 1,52 MB file:

    XML file loading time MSXML: 240,20 [ms]

    XML node selections MSXML: 1,09 [s]

    XML file loading time OmniXML: 2,25 [s]

    XML node selections OmniXML: 1,22 [s]

    XML file loading time JclSimpleXML: 2,11 [s]

    and access violation for JclSimpleXML node selections :|

    Unfortunately I actually haven't much time to correct above AV, but sorces are contained below...

    fmuMain.pas

    program XmlEngines;
    
    uses
      FastMM4,
      Forms,
      fmuMain in 'fmuMain.pas' {fmMain},
      uXmlEngines in 'uXmlEngines.pas',
      ifcXmlEngine in 'ifcXmlEngine.pas';
    
    {$R *.res}
    
    begin
      Application.Initialize;
      Application.Title := 'XML Engine Tester';
      Application.CreateForm(TfmMain, fmMain);
      Application.Run;
    end.
    

    fmuMain.pas

    unit fmuMain;
    
    interface
    
    uses
      Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
      Dialogs, xmldom, XMLIntf, msxmldom, XMLDoc,
      //
      ifcXmlEngine, StdCtrls;
    
    type
      TfmMain = class(TForm)
        mmoDebug: TMemo;
        dlgOpen: TOpenDialog;
    
        procedure FormCreate(Sender: TObject);
        procedure FormDestroy(Sender: TObject);
    
        procedure mmoDebugClick(Sender: TObject);
    
      private
        fXmlEngines: TInterfaceList;
        function Get_Engine(const aIx: Integer): IXmlEngine;
    
      protected
        property XmlEngine[const aIx: Integer]: IXmlEngine read Get_Engine;
    
        procedure Debug(const aInfo: string); // inline
    
      public
        procedure RegisterXmlEngine(const aEngine: IXmlEngine);
    
      end;
    
    var
      fmMain: TfmMain;
    
    implementation
    
    {$R *.dfm}
    
    uses
      uXmlEngines, TZTools;
    
    { TForm1 }
    
    function TfmMain.Get_Engine(const aIx: Integer): IXmlEngine;
    begin
      Result:= nil;
      Supports(fXmlEngines[aIx], IXmlEngine, Result)
    end;
    
    procedure TfmMain.RegisterXmlEngine(const aEngine: IXmlEngine);
    var
      Ix: Integer;
    begin
      if aEngine = nil then
        Exit; // WARRNING: program flow disorder
    
      for Ix:= 0 to Pred(fXmlEngines.Count) do
        if XmlEngine[Ix] = aEngine then
          Exit; // WARRNING: program flow disorder
    
      fXmlEngines.Add(aEngine)
    end;
    
    procedure TfmMain.FormCreate(Sender: TObject);
    begin
      fXmlEngines:= TInterfaceList.Create();
      dlgOpen.InitialDir:= ExtractFileDir(ParamStr(0));
      RegisterXmlEngine(TMsxmlEngine.Create(Self));
      RegisterXmlEngine(TOmniXmlEngine.Create());
      RegisterXmlEngine(TJediXmlEngine.Create());
    end;
    
    procedure TfmMain.mmoDebugClick(Sender: TObject);
    
      procedure TestEngines(const aFilename: TFileName);
    
        procedure TestEngine(const aEngine: IXmlEngine);
        var
          PerfCheck: TPerfCheck;
          Ix: Integer;
        begin
          PerfCheck := TPerfCheck.Create();
          try
    
            PerfCheck.Init(True);
            PerfCheck.Start();
            aEngine.Load(aFilename);
            PerfCheck.Pause();
            Debug(Format(
              'XML file loading time %s: %s',
              [aEngine.Get_ID(), PerfCheck.TimeStr()]));
    
            if aEngine.Get_ValidNode() then
            begin
              PerfCheck.Start();
              for Ix:= 0 to 999999 do
                if aEngine.Get_ChildsCount() > 0 then
                begin
    
                  aEngine.SelectChild(Ix mod aEngine.Get_ChildsCount());
    
                end
                else
                  aEngine.SelectRootNode();
    
              PerfCheck.Pause();
              Debug(Format(
                'XML nodes selections %s: %s',
                [aEngine.Get_ID(), PerfCheck.TimeStr()]));
            end
    
          finally
            PerfCheck.Free();
          end
        end;
    
      var
        Ix: Integer;
      begin
        Debug(aFilename);
        for Ix:= 0 to Pred(fXmlEngines.Count) do
          TestEngine(XmlEngine[Ix])
      end;
    
    var
      CursorBckp: TCursor;
    begin
      if dlgOpen.Execute() then
      begin
    
        CursorBckp:= Cursor;
        Self.Cursor:= crHourGlass;
        mmoDebug.Cursor:= crHourGlass;
        try
          TestEngines(dlgOpen.FileName)
        finally
          Self.Cursor:= CursorBckp;
          mmoDebug.Cursor:= CursorBckp;
        end
    
      end
    end;
    
    procedure TfmMain.Debug(const aInfo: string);
    begin
      mmoDebug.Lines.Add(aInfo)
    end;
    
    procedure TfmMain.FormDestroy(Sender: TObject);
    begin
      fXmlEngines.Free()
    end;
    
    end.
    

    ifcXmlEngine.pas

    unit ifcXmlEngine;
    
    interface
    
    uses
      SysUtils;
    
    type
      TFileName = SysUtils.TFileName;
    
      IXmlEngine = interface
        ['{AF77333B-9873-4FDE-A3B1-260C7A4D3357}']
        procedure Load(const aFilename: TFileName);
        procedure SelectRootNode();
        procedure SelectChild(const aIndex: Integer);
        procedure SelectParent();
        //
        function Get_ID(): string;
        function Get_ValidNode(): Boolean;
        function Get_ChildsCount(): Integer;
        function Get_HaveParent(): Boolean;
        //function Get_NodeName(): Boolean;
      end;
    
    implementation
    
    end.
    

    uXmlEngines.pas

    unit uXmlEngines;
    
    interface
    
    uses
      Classes,
      //
      XMLDoc, XMLIntf, OmniXml, JclSimpleXml,
      //
      ifcXmlEngine;
    
    type
      TMsxmlEngine = class(TInterfacedObject, IXmlEngine)
      private
        fXmlDoc: XMLDoc.TXMLDocument;
        fNode: XMLIntf.IXMLNode;
    
      protected
    
      public
        constructor Create(const aOwner: TComponent);
        destructor Destroy; override;
    
        procedure Load(const aFilename: TFileName);
        procedure SelectRootNode();
        procedure SelectChild(const aIndex: Integer);
        procedure SelectParent();
        //
        function Get_ID(): string;
        function Get_ValidNode(): Boolean;
        function Get_ChildsCount(): Integer;
        function Get_HaveParent(): Boolean;
        //function Get_NodeName(): Boolean;
    
      end;
    
      TOmniXmlEngine = class(TInterfacedObject, IXmlEngine)
      private
        fXmlDoc: OmniXml.IXmlDocument;
        fNode: OmniXml.IXMLNode;
    
      protected
    
      public
        constructor Create;
        destructor Destroy; override;
    
        procedure Load(const aFilename: TFileName);
        procedure SelectRootNode();
        procedure SelectChild(const aIndex: Integer);
        procedure SelectParent();
        //
        function Get_ID(): string;
        function Get_ValidNode(): Boolean;
        function Get_ChildsCount(): Integer;
        function Get_HaveParent(): Boolean;
        //function Get_NodeName(): Boolean;
    
      end;
    
      TJediXmlEngine = class(TInterfacedObject, IXmlEngine)
      private
        fXmlDoc: TJclSimpleXML;
        fNode: TJclSimpleXMLElem;
    
      protected
    
      public
        constructor Create();
        destructor Destroy(); override;
    
        procedure Load(const aFilename: TFileName);
        procedure SelectRootNode();
        procedure SelectChild(const aIndex: Integer);
        procedure SelectParent();
        //
        function Get_ID(): string;
        function Get_ValidNode(): Boolean;
        function Get_ChildsCount(): Integer;
        function Get_HaveParent(): Boolean;
        //function Get_NodeName(): Boolean;
    
      end;
    
    implementation
    
    uses
      SysUtils;
    
    { TMsxmlEngine }
    
    constructor TMsxmlEngine.Create(const aOwner: TComponent);
    begin
      if aOwner = nil then
        raise Exception.Create('TMsxmlEngine.Create() -> invalid owner');
    
      inherited Create();
      fXmlDoc:= XmlDoc.TXmlDocument.Create(aOwner);
      fXmlDoc.ParseOptions:= [poPreserveWhiteSpace]
    end;
    
    destructor TMsxmlEngine.Destroy;
    begin
      fXmlDoc.Free();
      inherited Destroy()
    end;
    
    function TMsxmlEngine.Get_ChildsCount: Integer;
    begin
      Result:= fNode.ChildNodes.Count
    end;
    
    function TMsxmlEngine.Get_HaveParent: Boolean;
    begin
      Result:= fNode.ParentNode <> nil
    end;
    
    function TMsxmlEngine.Get_ID: string;
    begin
      Result:= 'MSXML'
    end;
    
    //function TMsxmlEngine.Get_NodeName: Boolean;
    //begin
    //  Result:= fNode.Text
    //end;
    
    function TMsxmlEngine.Get_ValidNode: Boolean;
    begin
      Result:= fNode <> nil
    end;
    
    procedure TMsxmlEngine.Load(const aFilename: TFileName);
    begin
      fXmlDoc.LoadFromFile(aFilename);
      SelectRootNode()
    end;
    
    procedure TMsxmlEngine.SelectChild(const aIndex: Integer);
    begin
      fNode:= fNode.ChildNodes.Get(aIndex)
    end;
    
    procedure TMsxmlEngine.SelectParent;
    begin
      fNode:= fNode.ParentNode
    end;
    
    procedure TMsxmlEngine.SelectRootNode;
    begin
      fNode:= fXmlDoc.DocumentElement
    end;
    
    { TOmniXmlEngine }
    
    constructor TOmniXmlEngine.Create;
    begin
      inherited Create();
      fXmlDoc:= OmniXml.TXMLDocument.Create();
      fXmlDoc.PreserveWhiteSpace:= true
    end;
    
    destructor TOmniXmlEngine.Destroy;
    begin
      fXmlDoc:= nil;
      inherited Destroy()
    end;
    
    function TOmniXmlEngine.Get_ChildsCount: Integer;
    begin
      Result:= fNode.ChildNodes.Length
    end;
    
    function TOmniXmlEngine.Get_HaveParent: Boolean;
    begin
      Result:= fNode.ParentNode <> nil
    end;
    
    function TOmniXmlEngine.Get_ID: string;
    begin
      Result:= 'OmniXML'
    end;
    
    //function TOmniXmlEngine.Get_NodeName: Boolean;
    //begin
    //  Result:= fNode.NodeName
    //end;
    
    function TOmniXmlEngine.Get_ValidNode: Boolean;
    begin
      Result:= fNode <> nil
    end;
    
    procedure TOmniXmlEngine.Load(const aFilename: TFileName);
    begin
      fXmlDoc.Load(aFilename);
      SelectRootNode()
    end;
    
    procedure TOmniXmlEngine.SelectChild(const aIndex: Integer);
    begin
      fNode:= fNode.ChildNodes.Item[aIndex]
    end;
    
    procedure TOmniXmlEngine.SelectParent;
    begin
      fNode:= fNode.ParentNode
    end;
    
    procedure TOmniXmlEngine.SelectRootNode;
    begin
      fNode:= fXmlDoc.DocumentElement
    end;
    
    { TJediXmlEngine }
    
    constructor TJediXmlEngine.Create;
    begin
      inherited Create();
      fXmlDoc:= TJclSimpleXML.Create();
    end;
    
    destructor TJediXmlEngine.Destroy;
    begin
      fXmlDoc.Free();
      inherited Destroy()
    end;
    
    function TJediXmlEngine.Get_ChildsCount: Integer;
    begin
      Result:= fNode.ChildsCount
    end;
    
    function TJediXmlEngine.Get_HaveParent: Boolean;
    begin
      Result:= fNode.Parent <> nil
    end;
    
    function TJediXmlEngine.Get_ID: string;
    begin
      Result:= 'JclSimpleXML';
    end;
    
    //function TJediXmlEngine.Get_NodeName: Boolean;
    //begin
    //  Result:= fNode.Name
    //end;
    
    function TJediXmlEngine.Get_ValidNode: Boolean;
    begin
      Result:= fNode <> nil
    end;
    
    procedure TJediXmlEngine.Load(const aFilename: TFileName);
    begin
      fXmlDoc.LoadFromFile(aFilename);
      SelectRootNode()
    end;
    
    procedure TJediXmlEngine.SelectChild(const aIndex: Integer);
    begin
      fNode:= fNode.Items[aIndex]
    end;
    
    procedure TJediXmlEngine.SelectParent;
    begin
      fNode:= fNode.Parent
    end;
    
    procedure TJediXmlEngine.SelectRootNode;
    begin
      fNode:= fXmlDoc.Root
    end;
    
    end.
    
    0 讨论(0)
  • 2020-12-04 15:24

    Give a try to himXML by himitsu.

    It is released under MPL v1.1 , GPL v3.0 or LGPL v3.0 license.

    You will have to register to the Delphi-Praxis (german) excellent Delphi site so as to be able to download:

    • himxml_246.7z

    It has a very impressive performance and the distribution includes demos demonstrating that. I've successfully used it in Delphi 2007, Delphi 2010 and Delphi XE.

    0 讨论(0)
  • 2020-12-04 15:26

    some time ago I had to serialize record to XML format; for ex:

     TTest = record
        a : integer;
        b : real; 
     end;
    

    to

        <Data>
            <a type="tkInteger">value</a>
            <b type="tkFloat">value</b>
        </Data>

    I used RTTI to recursively navigate through record fields and storing values to XML. I've tried few XML Parsers. I did't need DOM model to create xml, but needed it to load it back.

    XML contained about 310k nodes (10-15MBytes); results presented in table below, there are 6 columns with time in seconds;
    1 - time for creating nodes and write values
    2 - SaveToFile();
    3 = 1 + 2
    4 - LoadFromFile();
    5 - navigate through nodes and read values
    6 = 4 + 5
    enter image description here

    MSXML/Xerces/ADOM - are differend vendors for TXMLDocument (DOMVendor)
    JanXML doesn't work with unicode; I fixed some errors, and saved XML, but loading causes AV (or stack overflow, I don't remember);
    manual - means manually writing XML using TStringStream.

    I used Delphi2010, Win7x32, Q8200 CPU/2.3GHz, 4Gb of RAM.

    update: You can download source code for this test (record serialization to XML using RTTI) here http://blog.karelia.pro/teran/files/2012/03/XMLTest.zip All parsers (Omni, Native, Jan) are included (now nodes count in XML is about 270k), sorry there are no comments in code.

    0 讨论(0)
提交回复
热议问题