Limit CsvListReader to one line

邮差的信 提交于 2019-12-11 19:04:27

问题


I am working on application which processes large CSV files (several hundreds of MB's). Recently I faced a problem which at first looked as a memory leak in application, but after some investigation, it appears that it is combination of bad formatted CSV and attempt of CsvListReader to parse never-ending line. As a result, I got following exception:

at java.lang.OutOfMemoryError.<init>(<unknown string>)
at java.util.Arrays.copyOf(<unknown string>)
   Local Variable: char[]#13624
at java.lang.AbstractStringBuilder.expandCapacity(<unknown string>)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(<unknown string>)
at java.lang.AbstractStringBuilder.append(<unknown string>)
at java.lang.StringBuilder.append(<unknown string>)
   Local Variable: java.lang.StringBuilder#3
at org.supercsv.io.Tokenizer.readStringList(<unknown string>)
   Local Variable: java.util.ArrayList#642
   Local Variable: org.supercsv.io.Tokenizer#1
   Local Variable: org.supercsv.io.PARSERSTATE#2
   Local Variable: java.lang.String#14960
at org.supercsv.io.CsvListReader.read(<unknown string>)

By analyzing heap dump and CSV file based on dump findings, I noticed that one of columns in one of CSV lines was missing closing quotes, which obviously resulted in reader trying to find end of the line by appending file content to internal string buffer until there was no more heap memory.

Anyway, that was the problem and it was due to bad formatted CSV - once I removed critical line, problem disappeared. What I want to achieve is to tell reader that:

  • All the content it should interpret always ends with new line character, even if quotes are not closed properly (no multi-line support)
  • Alternatively, to provide certain limit (in bytes) of the CSV line

Is there some clear way to do this in SuperCSV using CsvListReader (preferred in my case)?


回答1:


That issue has been reported, and I'm working on some enhancements (for a future major release) at the moment that should make both options a bit easier.

For now, you'll have to supply your own Tokenizer to the reader (so Super CSV uses yours instead of its own). I'd suggest taking a copy of Super CSV's Tokenizer and modifying with your changes. That way you don't have to modify Super CSV, and you won't waste time.



来源:https://stackoverflow.com/questions/15223593/limit-csvlistreader-to-one-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!