问题
I am new to unix and would like to be able to do the following but am unsure how.
Take a text file with lines like:
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
And output this:
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
I would like the script to be able to find all all the lines for each TR value that have a unique Line value.
Thanks
回答1:
Since you are apparently O.K. with randomly choosing among the values for dir
, day
, TI
, and stn
, you can write:
sort -u -t ';' -k 1,1 -k 6,6 -s < input_file > output_file
Explanation:
- The
sort
utility, "sort lines of text files", lets you sort/compare/merge lines from files. (See the GNU Coreutils documentation.) - The
-u
or--unique
option, "output only the first of an equal run", tellssort
that if two input-lines are equal, then you only want one of them. - The
-k POS[,POS2]
or--key=POS1[,POS2]
option, "start a key at POS1 (origin 1), end it at POS2 (default end of line)", tellssort
where the "keys" are that we want to sort by. In our case,-k 1,1
means that one key consists of the first field (from field1
through field1
), and-k 6,6
means that one key consists of the sixth field (from field6
through field6
). - The
-t SEP
or--field-separator=SEP
option tellssort
that we want to useSEP
— in our case,';'
— to separate and count fields. (Otherwise, it would think that fields are separated by whitespace, and in our case, it would treat the entire line as a single field.) - The
-s
or--stabilize
option, "stabilize sort by disabling last-resort comparison", tellssort
that we only want to compare lines in the way that we've specified; if two lines have the same above-defined "keys", then they're considered equivalent, even if they differ in other respects. Since we're using-u
, that means that means that one of them will be discarded. (If we weren't using-u
, it would just mean thatsort
wouldn't reorder them with respect to each other.)
来源:https://stackoverflow.com/questions/12546417/sorting-by-unique-values-of-multiple-fields-in-unix-shell-script