Parse a csv using awk and ignoring commas inside a field

前端未结

关注

 7  1188

抹茶落季 2020-11-29 04:36

I have a csv file where each row defines a room in a given building. Along with room, each row has a floor field. What I want to extract is all floors in all buildings. <

7条回答

不知归路 (楼主)

2020-11-29 04:50
Since the problem is really to distinguish between a comma inside a CSV field and the one that separates fields, we can replace the first kind of comma with something else so that it easier to parse further, i.e., something like this:
```
0,"00BDF","AIRPORT TEST            "
0,0,"BRICKER HALL JOHN W    "
```
This gawk script (replace-comma.awk) does that:
```
BEGIN { RS = "(.)" } 
RT == "\x022" { inside++; } 
{ if (inside % 2 && RT == ",") printf(""); else printf(RT); }
```
This uses a gawk feature that captures the actual record separator into a variable called RT. It splits every character into a record, and as we are reading through the records, we replace the comma encountered inside a quote (\x022) with .

The FPAT solution fails in one special case where you have both escaped quotes and a comma inside quotes but this solution works in all cases, i.e,
```
§ echo '"Adams, John ""Big Foot""",1' | gawk -vFPAT='[^,]*|"[^"]*"' '{ print $1 }'
"Adams, John "
§ echo '"Adams, John ""Big Foot""",1' | gawk -f replace-comma.awk | gawk -F, '{ print $1; }'
"Adams John ""Big Foot""",1
```
As a one-liner for easy copy-paste:
```
gawk 'BEGIN { RS = "(.)" } RT == "\x022" { inside++; } { if (inside % 2 && RT == ",") printf(""); else printf(RT); }'
```
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...