I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:
\"Illegal quoting in line 53657.\"
The liberal_parsing option is available starting in Ruby 2.4 for cases like this. From the documentation:
When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.
To enable it, pass it as an option to the CSV read/parse/new methods:
CSV.read(filename, liberal_parsing: true)
I had this problem in a line like 123,456,a"b"c
The problem is the CSV parser is expecting "
, if they appear, to entirely surround the comma-delimited text.
Solution use a quote character besides "
that I was sure would not appear in my data:
CSV.read(filename, :quote_char => "|")
Don't let CSV both read and parse the file.
Just read the file yourself and hand each line to CSV.parse_line
, and then rescue
any exceptions it throws.
Apparently this error can also be caused by unprintable BOM characters. This thread suggests using a file mode to force a conversion, which is what finally worked for me.
require 'csv'
CSV.open(@filename, 'r:bom|utf-8') do |csv|
# do something
end
Try forcing double quote character "
as quote char:
require 'csv'
CSV.foreach(file,{headers: :first_row, quote_char: "\x00"}) do |line|
p line
end