FileHelpers throws OutOfMemoryException when parsing large csv file

时光总嘲笑我的痴心妄想 提交于 2019-11-29 07:49:10

You must work record by record in this way:

  string fileName = @"c:\myfile.csv.gz";
  using (var fileStream = File.OpenRead(fileName))
  {
      using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false))
      {
          using (TextReader textReader = new StreamReader(gzipStream))
          {
            var engine = new FileHelperAsyncEngine<CSVItem>();
            using(engine.BeginReadStream(textReader))
            {
                foreach(var record in engine)
                {
                   // Work with each item
                }
            }
          }
      }
  }

If you use this async aproach you will only be using the memory for a record a time, and that will be much more faster.

This isn't a complete answer, but if you have a 20GB csv file, you'll need 20GB+ to store the whole thing in memory at once unless your reader keeps everything compressed in memory (unlikely). You need to read the file in chunks, and the solution you're using of putting everything into an array will not work if you don't have huge amounts of ram.

You need a loop a bit more like this:

CsvReader reader = new CsvReader(filePath)
CSVItem item = reader.ReadNextItem();
while(item != null){
  DoWhatINeedWithCsvRow(item);
  item = reader.ReadNextItem();
}

C#'s memory management will then be smart enough to dispose of the old CSVItems as you go through them, provided you don't keep references to them hanging around.

A better version would read a chunk from the CSV (eg. 10,000 rows), deal with all those, then get another chunk, or create a task for DoWhatINeedWithCsvRow if you don't care about processing order.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!