What is the fastest way to Parse a line in Delphi?

前端 未结 9 1970
深忆病人
深忆病人 2020-12-13 01:18

I have a huge file that I must parse line by line. Speed is of the essence.

Example of a line:

Token-1   Here-is-the-Next-Token      La         


        
9条回答
  •  情话喂你
    2020-12-13 01:51

    • Use PChar incrementing for speed of processing
    • If some tokens are not needed, only copy token data on demand
    • Copy PChar to local variable when actually scanning through characters
    • Keep source data in a single buffer unless you must handle line by line, and even then, consider handling line processing as a separate token in the lexer recognizer
    • Consider processing a byte array buffer that has come straight from the file, if you definitely know the encoding; if using Delphi 2009, use PAnsiChar instead of PChar, unless of course you know the encoding is UTF16-LE.
    • If you know that the only whitespace is going to be #32 (ASCII space), or a similarly limited set of characters, there may be some clever bit manipulation hacks that can let you process 4 bytes at a time using Integer scanning. I wouldn't expect big wins here though, and the code will be as clear as mud.

    Here's a sample lexer that should be pretty efficient, but it assumes that all source data is in a single string. Reworking it to handle buffers is moderately tricky due to very long tokens.

    type
      TLexer = class
      private
        FData: string;
        FTokenStart: PChar;
        FCurrPos: PChar;
        function GetCurrentToken: string;
      public
        constructor Create(const AData: string);
        function GetNextToken: Boolean;
        property CurrentToken: string read GetCurrentToken;
      end;
    
    { TLexer }
    
    constructor TLexer.Create(const AData: string);
    begin
      FData := AData;
      FCurrPos := PChar(FData);
    end;
    
    function TLexer.GetCurrentToken: string;
    begin
      SetString(Result, FTokenStart, FCurrPos - FTokenStart);
    end;
    
    function TLexer.GetNextToken: Boolean;
    var
      cp: PChar;
    begin
      cp := FCurrPos; // copy to local to permit register allocation
    
      // skip whitespace; this test could be converted to an unsigned int
      // subtraction and compare for only a single branch
      while (cp^ > #0) and (cp^ <= #32) do
        Inc(cp);
    
      // using null terminater for end of file
      Result := cp^ <> #0;
    
      if Result then
      begin
        FTokenStart := cp;
        Inc(cp);
        while cp^ > #32 do
          Inc(cp);
      end;
    
      FCurrPos := cp;
    end;
    

提交回复
热议问题