问题
I have this line as an example from a CSV file:
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
I want to split it into an array. The immediate thought is to just split on commas, but some of the strings have commas in them, eg "Life and Living Processes, Life Processes", and these should stay as single elements in the array. Note also that there's two commas with nothing in between - i want to get these as empty strings.
In other words, the array i want to get is
[2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes","","",1,0,"endofline"]
I can think of hacky ways involving eval but i'm hoping someone can come up with a clean regex to do it...
cheers, max
回答1:
str=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
require 'csv' # built in
p CSV.parse(str)
# That's it! However, empty fields appear as nil.
# Makes sense to me, but if you insist on empty strings then do something like:
parser = CSV.new(str)
parser.convert{|field| field.nil? ? "" : field}
p parser.readlines
回答2:
This is not a suitable task for regular expressions. You need a CSV parser, and Ruby has one built in:
http://ruby-doc.org/stdlib/libdoc/csv/rdoc/classes/CSV.html
And an arguably superior 3rd part library:
http://fastercsv.rubyforge.org/
回答3:
EDIT: I failed to read the Ruby tag. The good news is, the guide will explain the theory behind building this, even if the language specifics aren't right. Sorry.
Here is a fantastic guide to doing this:
http://knab.ws/blog/index.php?/archives/10-CSV-file-parser-and-writer-in-C-Part-2.html
and the csv writer is here:
http://knab.ws/blog/index.php?/archives/3-CSV-file-parser-and-writer-in-C-Part-1.html
These examples cover the case of having a quoted literal in a csv (which may or may not contain a comma).
回答4:
text=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
x=[]
text.chomp.split("\042").each_with_index do |y,i|
i%2==0 ? x<< y.split(",") : x<<y
end
print x.flatten
output
$ ruby test.rb
["2412", "21", "Which of the following is not found in all cells?", "Curriculum", "Life and Living Processes, Life Processes", "", "", "", "1", "0", "endofline"]
回答5:
This morning I stumbled across a CSV Table Importer project for Ruby-on-Rails. Eventually you will find the code helpful:
Github TableImporter
回答6:
My preference is @steenstag's solution, but an alternative is to use String#scan with the following regular expression.
r = /(?<![^,])(?:(?!")[^,\n]*(?<!")|"[^"\n]*")(?![^,])/
If the variable str
holds the string given in the example, we obtain:
puts str.scan r
displays
2412
21
"Which of the following is not found in all cells?"
"Curriculum"
"Life and Living Processes, Life Processes"
1
0
"endofline"
Start your engine!
See also regex101 which provides a detailed explanation of each token of the regex. (Move your cursor across the regex.)
Ruby's regex engine performs the following operations.
(?<![^,]) : negative lookbehind assert current location is not preceded
by a character other than a comma
(?: : begin non-capture group
(?!") : negative lookahead asserts next char is not a double-quote
[^,\n]* : match 0+ chars other than a comma and newline
(?<!") : negative lookbehind asserts preceding character is not a
double-quote
| : or
" : match double-quote
[^"\n]* : match 0+ chars other than double-quote and newline
" : match double-quote
) : end of non-capture group
(?![^,]) : negative lookahead asserts current location is not followed
by a character other than a comma
Note that (?<![^,])
is the same as (?<=,|^)
and (?![^,])
is the same as (?=^|,)
.
来源:https://stackoverflow.com/questions/3933065/how-do-i-split-apart-a-csv-string-in-ruby