Awk consider double quoted string as one token and ignore space in between

前端 未结 7 1022
没有蜡笔的小新
没有蜡笔的小新 2020-12-15 17:26

Data file - data.txt:

ABC \"I am ABC\" 35 DESC
DEF \"I am not ABC\" 42 DESC

cat data.txt | awk \'{print $2}\'

will re

7条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-15 18:03

    The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.

    Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\" when running the below program.

    BEGIN { OFS = "" } {
        for (i = 1; i <= NF; i += 2) {
            gsub(/[ \t]+/, ",", $i)
        }
        print
    }
    

    This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.

    You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,).

    Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".

提交回复
热议问题