AWK set multiple delimiters for comma and quotes with commas

こ雲淡風輕ζ 提交于 2021-02-08 06:43:33

问题


I have a CSV file where columns are comma separated and columns with textual data that have commas are quoted.

Sometimes, within quoted text there also exist quotes to mean things like inches resulting in more quotes.

Textual data without embedded commas do not have quotes.

For example:

A,B,C
1,"hello, how are you",hello
2,car,bike
3,13.3 inch tv,"tv 13.3"""

How do i use awk to print the number of columns for each row of which i should get

3
3
3

I thought of using $awk -F'[,"]' but im getting way more columns than there is.

Help appreciated.


回答1:


GNU awk has an extension to handle just such problematic CSV files. Let's consider this first without quotes embedded within quotes:

$ awk -v FPAT="([^,]+)|(\"[^\"]+\")" '{print NF}' file.csv
3
3
3

How it works

Instead of defining fields by a separator, FPAT allows us to define a field by a regular expression. In this case, we define a field as either something that has no commas, ([^,]+), or as something that is surrounded by double quotes, (\"[^\"]+\").

For more detail, see the GNU manual.

Handling quotes embedded within quotes

In the revised version of the question, we have the line:

3,13.3 inch tv,"tv 13.3"""

In this extended case, double quotes can be included within the double quoted field if they themselves are doubled. To allow for this we extend the regex, as per lcd047's suggestion, to allow for an arbitrary number of such doubled-double-quotes within a field:

 awk -v FPAT="([^,]+)|(\"([^\"]|\"\")+\")"  '{print NF}' file.csv



回答2:


If you care about the field contents, use @John1024's solution, otherwise this is all you need:

$ awk -F, '{gsub(/"[^"]+"/,""); print NF}' file
3
3
3


来源:https://stackoverflow.com/questions/31083953/awk-set-multiple-delimiters-for-comma-and-quotes-with-commas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!