TStringList of objects taking up tons of memory in Delphi XE

后端 未结 10 1493
遇见更好的自我
遇见更好的自我 2021-01-05 14:50

I\'m working on a simulation program.

One of the first things the program does is read in a huge file (28 mb, about 79\'000 lines,), parse each line (about 150 field

10条回答
  •  没有蜡笔的小新
    2021-01-05 15:18

    Just one idea which may save memory.

    You could let the data stay on the original files, then just point to them from in-memory structures.

    For instance, it's what we do for browsing big log files almost instantly: we memory-map the log file content, then we parse it quick to create indexes of useful information in memory, then we read the content dynamically. No string is created during the reading. Only pointers to each line beginning, with dynamic arrays containing the needed indexes. Calling TStringList.LoadFromFile would be definitively much slower and memory consuming.

    The code is here - see the TSynLogFile class. The trick is to read the file only once, and make all indexes on the fly.

    For instance, here is how we retrieve a line of text from the UTF-8 file content:

    function TMemoryMapText.GetString(aIndex: integer): string;
    begin
      if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
        result := '' else
        result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
    end;
    

    We use the exact same trick to parse JSON content. Using such a mixed approach is used by the fastest XML access libraries.

    To handle your high-level data, and query them fast, you may try to use dynamic arrays of records, and our optimized TDynArray and TDynArrayHashed wrappers (in the same unit). Arrays of records will be less memory consuming, will be faster to search in because the data won't be fragemented (even faster if you use ordered indexes or hashes), and you'll be able to have high-level access to the content (you can define custom functions to retrieve the data from the memory mapped file, for instance). Dynamic arrays won't fit fast deletion of items (or you'll have to use lookup tables) - but you wrote you are not deleting much data, so it won't be a problem in your case.

    So you won't have any duplicated structure any more, only logic in RAM, and data on memory-mapped file(s) - I added a "s" here because the same logic could perfectly map to several source data files (you need some "merge" and "live refresh" AFAIK).

提交回复
热议问题