Parsing a CSV file using gawk

后端 未结 9 1251
感动是毒
感动是毒 2020-11-29 12:12

How do you parse a CSV file using gawk? Simply setting FS=\",\" is not enough, as a quoted field with a comma inside will be treated as multiple fields.

<
9条回答
  •  佛祖请我去吃肉
    2020-11-29 12:32

    Here's what I came up with. Any comments and/or better solutions would be appreciated.

    BEGIN { FS="," }
    {
      for (i=1; i<=NF; i++) {
        f[++n] = $i
        if (substr(f[n],1,1)=="\"") {
          while (substr(f[n], length(f[n]))!="\"" || substr(f[n], length(f[n])-1, 1)=="\\") {
            f[n] = sprintf("%s,%s", f[n], $(++i))
          }
        }
      }
      for (i=1; i<=n; i++) printf "field #%d: %s\n", i, f[i]
      print "----------------------------------\n"
    }
    

    The basic idea is that I loop through the fields, and any field which starts with a quote but does not end with a quote gets the next field appended to it.

提交回复
热议问题