CSV parsing with Commons CSV - Quotes within quotes causing IOException

[亡魂溺海] 提交于 2019-12-01 06:16:28

The problem here is that the quotes are not properly escaped. Your parser doesn't handle that. Try univocity-parsers as this is the only parser for java I know that can handle unescaped quotes inside a quoted value. It is also 4 times faster than Commons CSV. Try this code:

//configure the parser to handle your situation
CsvParserSettings settings = new CsvParserSettings();
settings.setUnescapedQuoteHandling(STOP_AT_CLOSING_QUOTE);

//create the parser
CsvParser parser = new CsvParser(settings);

//parse your line
String[] out = parser.parseLine("116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"");

for(String e : out){
    System.out.println(e);
}

This will print:

116
6
2
29 Sep 10
"JJ" (60 min)
http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj

Hope it helps.

Disclosure: I'm the author of this library, it's open source and free (Apache 2.0 license)

Quoting mainly allows for field to contain separator characters. If embedded quotes in a field are not escaped, this can't work, so there isn't any point in using quotes. If your example value was "JJ", 60 Min, how is a parser to know the comma is part of the field? The data format can't handle embedded commas reliably, so if you want to be able to do that, best to change the source to generate an RFC compliant csv format.

Otherwise, it looks like the data source is simply surrounding non-numeric fields with quotes, and separating each field a comma, so the parser needs to do the reverse. You should probably just treat the data as comma-delimited and strip the leading/trailing quotes yourself with removeStart/removeEnd.

You might use CSVFormat .withQuote(null), or forget about that and just use String .split(',')

I think that having both quotations AND spaces in the same token is what confuses the parser. Try this:

CSVFormat csvFormat = CSVFormat.DEFAULT.withQuote('"').withQuote(' ');

That should fix it.


Example

For your input line:

String line = "116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"";

Output is (and no exception is thrown):

[116, 6, 2, 29 Sep 10, ""JJ" (60 min)", "http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj"]

You can use withEscape('\\') to ignore quotes within quotes

CSVFormat csvFormat = CSVFormat.DEFAULT.withEscape('\\')

Reference: https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!