Dealing with commas in a CSV file

后端未结

关注

 27  3367

傲寒 2020-11-21 06:53

I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.

27条回答

萌比男神i (楼主)

2020-11-21 07:07
In case you're on a *nix-system, have access to sed and there can be one or more unwanted commas only in a specific field of your CSV, you can use the following one-liner in order to enclose them in " as RFC4180 Section 2 proposes:
```
sed -r 's/([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*)/\1"\2"\3/' inputfile
```
Depending on which field the unwanted comma(s) may be in you have to alter/extend the capturing groups of the regex (and the substitution).
The example above will enclose the fourth field (out of six) in quotation marks.

In combination with the --in-place-option you can apply these changes directly to the file.

In order to "build" the right regex, there's a simple principle to follow:
1. For every field in your CSV that comes before the field with the unwanted comma(s) you write one [^,]*, and put them all together in a capturing group.
2. For the field that contains the unwanted comma(s) you write (.*).
3. For every field after the field with the unwanted comma(s) you write one ,.* and put them all together in a capturing group.
Here is a short overview of different possible regexes/substitutions depending on the specific field. If not given, the substitution is \1"\2"\3.
```
([^,]*)(,.*)                     #first field, regex
"\1"\2                           #first field, substitution

(.*,)([^,]*)                     #last field, regex
\1"\2"                           #last field, substitution


([^,]*,)(.*)(,.*,.*,.*)          #second field (out of five fields)
([^,]*,[^,]*,)(.*)(,.*)          #third field (out of four fields)
([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*) #fourth field (out of six fields)
```
If you want to remove the unwanted comma(s) with sed instead of enclosing them with quotation marks refer to this answer.
0 讨论(0)

查看其它27个回答
发布评论:

提交评论
- 加载中...