How do you parse a CSV file using gawk? Simply setting FS=\",\" is not enough, as a quoted field with a comma inside will be treated as multiple fields.
Here's what I came up with. Any comments and/or better solutions would be appreciated.
BEGIN { FS="," }
{
for (i=1; i<=NF; i++) {
f[++n] = $i
if (substr(f[n],1,1)=="\"") {
while (substr(f[n], length(f[n]))!="\"" || substr(f[n], length(f[n])-1, 1)=="\\") {
f[n] = sprintf("%s,%s", f[n], $(++i))
}
}
}
for (i=1; i<=n; i++) printf "field #%d: %s\n", i, f[i]
print "----------------------------------\n"
}
The basic idea is that I loop through the fields, and any field which starts with a quote but does not end with a quote gets the next field appended to it.