Sorting by unique values of multiple fields in UNIX shell script

本秂侑毒 提交于 2019-12-23 02:36:58

问题


I am new to unix and would like to be able to do the following but am unsure how.

Take a text file with lines like:

TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester

And output this:

TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester

I would like the script to be able to find all all the lines for each TR value that have a unique Line value.

Thanks


回答1:


Since you are apparently O.K. with randomly choosing among the values for dir, day, TI, and stn, you can write:

sort -u -t ';' -k 1,1 -k 6,6 -s < input_file > output_file

Explanation:

  • The sort utility, "sort lines of text files", lets you sort/compare/merge lines from files. (See the GNU Coreutils documentation.)
  • The -u or --unique option, "output only the first of an equal run", tells sort that if two input-lines are equal, then you only want one of them.
  • The -k POS[,POS2] or --key=POS1[,POS2] option, "start a key at POS1 (origin 1), end it at POS2 (default end of line)", tells sort where the "keys" are that we want to sort by. In our case, -k 1,1 means that one key consists of the first field (from field 1 through field 1), and -k 6,6 means that one key consists of the sixth field (from field 6 through field 6).
  • The -t SEP or --field-separator=SEP option tells sort that we want to use SEP — in our case, ';' — to separate and count fields. (Otherwise, it would think that fields are separated by whitespace, and in our case, it would treat the entire line as a single field.)
  • The -s or --stabilize option, "stabilize sort by disabling last-resort comparison", tells sort that we only want to compare lines in the way that we've specified; if two lines have the same above-defined "keys", then they're considered equivalent, even if they differ in other respects. Since we're using -u, that means that means that one of them will be discarded. (If we weren't using -u, it would just mean that sort wouldn't reorder them with respect to each other.)


来源:https://stackoverflow.com/questions/12546417/sorting-by-unique-values-of-multiple-fields-in-unix-shell-script

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!