CSVReader - bug when using " for escape char

我与影子孤独终老i 提交于 2019-12-02 12:56:12

问题


I am using OpenCSV.

I have a CSVReader trying to parse a CSV file.
That file has quote char " and separator char , and escape char also ".

Note that the CSV contains cells like:

"ballet 24"" classes"
"\"  

which actually represent these values:

ballet 24" classes
\

Example:

"9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice lesson","Broad","0.00","0","1","3.00","0.00","0.00","0.00","7","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990795","ballet class","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet 24"" classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Computer","2329043291","Bing and Yahoo! search","22951990817","\","Broad","0.00","0","1","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","1","7.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","4","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990874","zumba lessons","Broad","0.00","0","1","2.00","0.00","0.00","0.00","0","0","",""

My problem is that I cannot specify " for escape char to the CSVReader constructor
(i.e. make it the same as the quote char).
If I do so, the CSVReader simply goes crazy, and it reads the whole CSV line as a single CSV cell.

Has anyone else encountered this bug and how to get around it?!


回答1:


It will work if you go with the default settings for CsvReader.

Check this open bug they have: sourceforge.net/p/opencsv/bugs/83:

Actually, it works fine, just not the way you think. Its defaults are comma for separator, quote for the quote character, and backslash for the escape character. However, it understands two consecutive quote characters as an escaped quote character. So, if you just go with the defaults, it will work fine.

By default, it is able to escape double quote with double quote, but your 'true' escape character must still be something else.

So the following works:

CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
  • comma as separator
  • double quote as quote char
  • dash (any other character) as escape character

At first I put '\' as escape character, but then, your field "\" would need to be modified to escape the escape character.




回答2:


The CSVReader is not fully RFC4180 compliant. Use their newer CSV reader (RFC4180Parser):

RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
    new FileReader("input.csv"));

CSVReader reader = csvReaderBuilder
    .withCSVParser(rfc4180Parser)
    .build();

To read a String line formatted as a CSV:

String test = "ballet 24\"\" classes";
String[] columns = new RFC4180Parser().parseLine(test);

To use the reader (an alternative is reader.readNext()):

for (String[] line : reader.readAll()) {
  for (String s : line) {
    System.out.println(s);
  }
}

See http://opencsv.sourceforge.net/#rfc4180parser for more details.

Code taken from GeekPrompt




回答3:


It cannot be done through CSVReader

from pyspark.sql.session import SparkSession

spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")


来源:https://stackoverflow.com/questions/38326033/csvreader-bug-when-using-for-escape-char

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!