问题
I have a requirement to select the 7th column from a tab delimited file. eg:
cat filename | awk '{print $7}'
The issue is that the data in the 4th column has multiple values with blank in between. example - The last line in the below output:
user \Adminis FL_vol Design 0 - 1 -
group 0 FL_vol Design 19324481 - 3014 -
user \MAK FL_vol Design 16875161 - 2618 -
tree 826 FL_vol Out Global Doc Mark 16875162 - 9618 - /vol/FL_vol/Out Global Doc Mark
回答1:
If the data is unambiguously tab-separated, then cut
will cut on tabs, not spaces:
cut -f7 filename
You can certainly do that with awk
, too:
awk -F'\t' '{ print $7 }'
回答2:
If fields are separated by tabs and your concern is that some fields contain spaces, there is no problem here, just:
cut -f 7
(cut defaults to tab delimited fields.)
回答3:
Judging by the format of your input file, you can get away with delimiting on -
instead of spaces:
awk 'BEGIN{FS="-"} {print $2}' filename
FS
stands for Field Separator, just think of it as the delimiter for input.- Given that we are now delimiting on
-
, your 7th field before now becomes the 2nd field. - Save a cat! Specify input file
filename
as an argument to awk instead.
Alternatively, if your data fields are separated by tabs, you can do it more explicitly as follows:
awk 'BEGIN{FS="\t"} {print $7}' filename
And this will resolve the issue since Out Global Doc Mark
looks to be separated by spaces.
回答4:
This might work for you (GNU sed):
sed -r 's/(([^\t]*)\t?){7}.*/\2/' file
This substitute command selects everything in the line and returns the 7th non-tab. In sed
the last thing grouped by (...)
will be returned in the lefthand side of the substitution by using a back-reference. In this case the first back-reference would return both the non-tab characters and the tab character (if present N.B. the ?
meta-character which either one or none of the proceeding pattern).The .*
just swallows up what was left on the line if any.
来源:https://stackoverflow.com/questions/13795196/select-a-particular-column-using-awk-or-cut-or-perl