Strange EOutOfMemory exception using TStringList

柔情痞子 提交于 2019-12-25 09:32:44

问题


I have a system that loads some text files that are zipped into a ".log" file and parse then into informational classes using multiple threads that each deals with a different file and adds the parsed objects to a list. The file is loaded using TStringList, since it was the fastest method that I tested.

The number of text files is variable but normally I have to deal with something between 5 to 8 files ranging from 50Mb to 120Mb in one incursion.

My problem: The user can load the .log files as many times they desire, and after some of those processes I receive an EOutOfMemory exception when trying to use TStringList.LoadFromFile. Of course, the first thing that comes to mind to anyone that has ever used a StringList is that you should not use it when dealing with big textfiles, but this exception happens randomly and after the process has already been completed successfully at least once (the objects are destroyed before the start of a new parsing so the memory is retrieved correctly apart from some minor leaks)

I tried using textile and TStreamReader but it's not as fast as TStringList and the duration of the process is the greatest concern with this feature.

I'm using 10.1 Berlin, the parse process is a simple iteration trough the list of varied length lines and construction of objects based on the line info.

Essentially, my question is, what is causing this and how can i fix it. I may use other ways to load the file and read its contents but it must be as fast (or better) as the TStringList method.

Loading thread execute code:

TThreadFactory= class(TThread)
  protected
     // Class that holds the list of Commands already parsed, is owned outside of the thread
    _logFile: TLogFile;
    _criticalSection: TCriticalSection;
    _error: string;

    procedure Execute; override;
    destructor Destroy; override;

  public
    constructor Create(AFile: TLogFile; ASection: TCriticalSection); overload;

    property Error: string read _error;

  end;

implementation


{ TThreadFactory}

    constructor TThreadFactory.Create(AFile: TLogFile; ASection: TCriticalSection);
    begin
      inherited Create(True);
      _logFile := AFile;

      _criticalSection := ASection;
    end;


    procedure TThreadFactory.Execute;
        var
          tmpLogFile: TStringList;
          tmpConvertedList: TList<TLogCommand>;
          tmpCommand: TLogCommand;
          tmpLine: string;
          i: Integer;
        begin
          try
            try
              tmpConvertedList:= TList<TLogCommand>.Create;       

                if (_path <> '') and not(Terminated) then
                begin

                  try
                    logFile:= TStringList.Create;
                    logFile.LoadFromFile(tmpCaminho);

                    for tmpLine in logFile do
                    begin
                      if Terminated then
                        Break;

                      if (tmpLine <> '') then
                      begin
                        // the logic here was simplified that's just that 
                        tmpConvertedList.Add(TLogCommand.Create(tmpLine)); 
                      end;
                    end;
                  finally
                    logFile.Free;
                  end;

                end;


              _cricticalSection.Acquire;

              _logFile.AddCommands(tmpConvertedList);
            finally
              _cricticalSection.Release;

              FreeAndNil(tmpConvertedList);    
            end;
          Except
            on e: Exception do
              _error := e.Message;
          end;
        end;

    end.     

Added: Thank you for all your feedback. I will address some issues that were discussed but I failed to mention in my initial question.

  • The .log file has multiple instances of .txt files inside but it can also have multiple .log files, each file represents a day worth of logging or a period selected by the user, since the decompression takes a lot of time a thread is started every time a .txt is found so I can start parsing immediately, this has shortened the noticeable waiting time for the user

  • The "minor leaks" are not shown by ReportMemoryLeaksOnShutdown and other methods like TStreamReader avoid this issue

  • The list of commands is held by TLogFile. There is only one instance of this class at any time and is destroyed whenever the user wants to load a .log file. All threads add commands to the same object, that's the reason for the critical section.

  • Can't detail the parse process since it would disclose some sensible information, but it's a simple information gathering from the string and the TCommand

  • Since the beginning I was aware of fragmentation but I never found concrete proof that TStringList causes the fragmentation only by loading multiple times, if this can be confirmed I would be very glad

Thank you for you attention. I ended up using an external library that was capable of reading lines and loading files with the same speed as TStringList without the need to load the whole file into memory

https://github.com/d-mozulyov/CachedTexts/tree/master/lib


回答1:


  1. TStringList is slow class per se. It has a lot of -bells and whistles- extra features and functions that bog it down. Much faster containers would be TList<String> or plain old dynamic array of string. See System.IOUTils.TFile.ReadAllLines function.

  2. Read about Heap Memory Fragmentation, for example http://en.wikipedia.org/Heap_fragmentation

It can happen and break your application even without memory leaks. But since you say there are many small leaks - that is what most probably happen. You can more or less delay the crash by avoiding reading whole files into memory and operating with smaller chunks. But degradation would still go on, even slower, and in the end your program would crash again.

  1. There are a lot of ad hoc classes libraries, reading large files piece after piece with buffering, pre-fetching and what not. One of such kind of libraries, targeted at texts, is http://github.com/d-mozulyov/CachedTexts and there are others too.

PS. General notes.

I think your team should reconsider how much need for multithreading you have. Frankly, I see none. You are loading files from HDD and probably you write processed and transformed files to the same (at best to some another) HDD. That means, your program speed is limited with disk speed. And that speed is MUCH less than speeds of CPU and RAM. By introducing multithreading you seem only to make your program more complex and fragile. Errors are much harder to detect, well known libraries may suddenly misbehave in MT mode, etc. And you probably get no performance increase, because the bottleneck is at disk I/O speed.

If you still want multithreading for the sake of it - then perhaps look into OmniThreading Library. It was designed to simplify developing "data streams" types of MT applications. Read the tutorials and examples.

I definitely suggest you to squash all those "some minor leaks" and as part of it to fix all compilation warnings. I know, it is hard when you are not the only programmer at the project and others do not care. Still "minor leaks" means none on your team knows how the program actually behaves or behaved. And non-deterministic random behavior in multi-threading environment can easily generate tonnes of random Shroeden-bugs which you would never be able to reproduce and fix.

Your try-finally pattern really is broken. The variable you clean up in finally block should be assigned right before try block, not within it!

o := TObject.Create;
try
  ....
finally
  o.Destroy;
end;

This is correct way:

  • either the object fails to be created - then try-block would not be entered and neither would be finally-block.
  • or the object is successfully created - and then try-block would be entered and so would be finally-block

So, sometimes,

o := nil;
try
  o := TObject.Create;
  ....
finally
  o.Free;
end;

This is also correct. The variable is set to be nil immediately before try-block is entered. If object creation fails, then when finally-blocks calls Free method the variable was already assigned, and TObject.Free (but NOT TObject.Destroy) was designed to be able to work on nil object references. By itself is just a noisy, overly verbose modification of the first one, but it serves as a foundation to few more derivatives.

That pattern may be used when you do not know would you create an object or not.

o := nil;
try
  ...
  if SomeConditionCheck() 
     then o := TObject.Create;  // but maybe not
  ....
finally
  o.Free;
end;

Or when object creation is delayed, because you need to calculate some data for its creation, or because the object is very heavy (for example globally blocking access to some file) so you strive to keep its lifetime as short as possible.

o := nil;
try
  ...some code that may raise errors
  o := TObject.Create; 
  ....
finally
  o.Free;
end;

That code though asks why the said "...some code" was not moved outside and before the try-block. Usually it can and should be. A rather rare pattern.

One more derivative from that pattern is used when creating several objects;

o1 := nil;
o2 := nil;
o3 := nil;
try
  o2 := TObject.Create;
  o3 := TObject.Create;
  o1 := TObject.Create;
  ....
finally
  o3.Free;
  o2.Free;
  o1.Free;
end;

Goal is, if for example o3 object creation fails, then o1 would get freed and o2 was not created and the Free calls in finally-block would know it.

That is semi-correct. It is assumed that destructing objects would never raise its own exceptions. Usually that assumption is correct, but not always. Anyway, this pattern lets you fuse several try-finally blocks into one, which makes source code shorter (easier to read and reason about) and execution a little bit faster. Usually this is also reasonably safe, but not always.

Now two typical misuses of the pattern:

o := TObject.Create;
..... some extra code here
try
  ....
finally
  o.Destroy;
end;

If the code BETWEEN object creation and try-block raises some error - then there is no anyone to free that object. You just got a memory leak.

When you read Delphi sources you see maybe there a similar pattern

with TObject.Create do
try
  ....some very short code
finally
  Destroy;
end;

With all the wide-spread zeal against any use of with construct, this pattern precludes adding extra code between object creation and try-guarding. The typical with drawbacks - possible namespaces collision and inability to pass this anonymous object to other functions as an argument - are included.

One more unlucky modification:

o := nil;
..... some extra code here
..... that does never change o value
..... and our fortuneteller warrants never it would become
..... we know it for sure
try
  ....
  o := TObject.Create;
  ....
finally
  o.Free;
end;

This pattern is technically correct, but rather fragile at that. You do not immediately see the link between o := nil line and the try-block. When you would develop the program in the future, you may easily forget it and introduce the errors: like copy-pasting/moving the try-block into another function and forgetting the nil-initializing. Or extending the in-between code and making it use (thus - change) value of that o. There is one case i sometimes use it, but it is very rare and comes with risks.

Now,

...some random code here that does not
...initialize o variable, so the o contains
...random memory garbage here
try
  o := TObject.Create;
  ....
finally
  o.Destroy; // or o.Free
end;

This is what you write a lot without thinking how try-finally works and why it was invented. The problem is simple: when you enter the try-block your o variable is a container with random garbage. Now when you try to create the object, you may face some error raised. What then? Then you go into the finally-block and call (random-garbage).Free - and what should it do? It would do the random garbage.

So, to repeat all the above.

  1. try-finally is used to warrant object freeing or any other variables cleanup (closing files, closing windows, etc), and hence:
  2. the variable used to track that resource (such as object reference) should have well known value on the entrance into the try-block, this it should be assigned (initialized) before the try keyword. If you guard the file - then open it immediately before try. If you guard against memory leak - create the object before try. Etc. Do not do our first initialization AFTER try operator - WITHIN try-block - it is too late there.
  3. you better design the code as simple (self-evident) as you can, eliminating potential to introduce future errors when you would forget non-explicit hidden assumptions you keep in the corner of your mind today - and would cross them. See Who wrote this programing saying? "Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." . Here it means, initialize (assign) the variable guarded by try-block IMMEDIATELY before the block starts, right above the try keyword. Better even, insert an empty line before that assignment. Make it jump into your (or any other reader's) eyes that this variable and this try are mutually dependent and should never be broken apart.


来源:https://stackoverflow.com/questions/46911835/strange-eoutofmemory-exception-using-tstringlist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!