Invalid char between encapsulated token and delimiter in Apache Commons CSV library

此生再无相见时 提交于 2019-11-29 05:27:24

We ran into this issue when we had embedded quote in our data.

0,"020"1,"BS:5252525  ORDER:99999"4

Solution applied was CSVFormat csvFileFormat = CSVFormat.DEFAULT.withQuote(null);

@Cuga tip helped us to resolve. Thanks @Cuga

Full code is

    public static void main(String[] args) throws IOException {
    FileReader fileReader = null;
    CSVFormat csvFileFormat = CSVFormat.DEFAULT.withQuote(null);
    String fileName = "test.csv";

    fileReader = new FileReader(fileName);
    CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);

    List<CSVRecord> csvRecords = csvFileParser.getRecords();

    for (CSVRecord csvRecord : csvRecords) {

Result is

CSVRecord [comment=null, mapping=null, recordNumber=1, values=[0, "020"1, "BS:5252525  ORDER:99999"4]]

That line in the CSV file contains an invalid character between one of your cells and either the end of line, end of file, or the next cell. A very common cause for this is a failure to escape your encapsulating character (the character that is used to "wrap" each cell, so CSV knows where a cell (token) starts and ends.

I found the solution to the problem. One of my CSV file has an attribute as follows: "attribute with nested "quote" "

Due to nested quote in the attribute the parser fails.

To avoid the above problem escape the nested quote as follows: "attribute with nested """"quote"""" "

This is the one way to solve the problem.

We ran into this in this same error with data containing quotes in otherwise unquoted input. I.e.:

some cell|this "cell" caused issues|other data

It was hard to find, but in Apache's docs, they mention the withQuote() method which can take null as a value.

We were getting the exact same error message and this (thankfully) ended up fixing the issue for us.
