Bash How to wrap values of the first line of a csv file with quotations, if they do not exist

穿精又带淫゛_ 提交于 2020-03-23 06:38:31

问题


The other day I asked how to wrap values of the first line of a csv file with quotations. I was given this reply which worked great.

$ cat file.csv  
word1,word2,word3,word4,word5  
12345,12346,12347,12348,12349  

To put quotes around the items in the first line only:

$ sed '1 { s/^/"/; s/,/","/g; s/$/"/ }' file.csv  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349 

I now need to test if the quotes exist around the values to eliminate chances of double quoting values.


回答1:


This problem suits awk more than sed due to row/column processing:

awk 'BEGIN{FS=OFS=","} NR==1 {
   for (i=1; i<=NF; i++) {gsub(/^"|"$/, "", $i); $i = "\"" $i "\""}
} 1' file

"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
  • Using gsub function we remove leading or trailing double quote, if it exists
  • Then we can safely wrap each cell in double quotes



回答2:


Change each of the substitutions to include optional quotes:

sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv

I have added -E to enable extended mode, so that ? is understood to mean "0 or 1 match".

You could also keep on using basic mode (no -E) and replace each ? with either \{0,1\} (again, 0 or 1 match) or * (which matches 0 or more).




回答3:


Regular expressions with sed and awk are subject to a seemingly never-ending series of edge cases that fail. Leveraging a csv library instead provides a great deal more robustness.

I found Python's library was the best choice because it's:

  1. widely available without onerous dependencies, with the exception of Python itself;
  2. not particular sensitive to the version of Python you use;
  3. lends itself to being embedded within a shell script; and
  4. is quite compact (a one-liner will do!).

Thus, my solution is along the lines of:

QUOTE_CSV_PY='import sys; import csv; csv.writer(sys.stdout, quoting=csv.QUOTE_ALL).writerows(csv.reader(sys.stdin))'
head -1 file.csv | python -c "$QUOTE_CSV_PY"; tail -n +2 file.csv

To break it down:

  • QUOTE_CSV_PY is a shell variable containing the Python one-liner commands
  • The Python commands simply import the standard sys and csv modules. It then creates a csv writer that writes to stdout with QUOTE_ALL set so all fields get quoted. It is fed a csv reader that reads from stdin.
  • head -1 sends the first line to the python interpreter for processing.
  • ; tail -n +2 waits until the processing is done and then just dumps out every line from number two onwards.



回答4:


Keep your existing working sed command, by removing all possible double quotes first:

sed '1 { s/"//g; s/^/"/; s/,/","/g; s/$/"/ }' file.csv 



回答5:


To test each answer I created three files:

file.csv

word1,word2,word3,word4,word5  
12345,12346,12347,12348,12349 

file2.csv

"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349

file3.csv

"word1",word2,word3,"word4",word5  
12345,12346,12347,12348,12349

Then I created a bash script

#!/bin/bash  

sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv > final.csv  
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file2.csv > final2.csv  
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file3.csv > final3.csv 

Then I looked at the final files and the first lines were perfect.

# cat final*.csv  

"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349  


来源:https://stackoverflow.com/questions/46287556/bash-how-to-wrap-values-of-the-first-line-of-a-csv-file-with-quotations-if-they

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!