Output whole line once for each unique value of a column (Bash)

后端未结

关注

 4  1550

This must surely be a trivial task with awk or otherwise, but it\'s left me scratching my head this morning. I have a file with a format similar to this:

相关标签:

4条回答

梦毁少年i

2020-12-17 22:39
Just use sort:
```
sort -k 2,2 -u file
```
The -u removes duplicate entries (as you wanted), and the -k 2,2 makes just the field 2 the sorting field (and so ignores the rest when checking for duplicates).
0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2020-12-17 22:45

One way using awk:

awk '!array[$2]++' file.txt

Results:

pep> AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK        1   genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR  5   genes ADUm.367
pep> VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR   3   genes ADUm.2146,ADUm.5750

0 讨论(0)

刺人心

2020-12-17 22:50
I would use Perl for this:
```
perl -nae 'print unless exists $seen{$F[1]}; undef $seen{$F[1]}' < input.txt
```
The n switch works line by line with the input, the a switch splits the line into the @F array.
0 讨论(0)
发布评论:

提交评论
- 加载中...

小蘑菇

2020-12-17 22:51

awk '{if($2==temp){next;}else{print}temp=$2}' your_file

tested below:

> awk '{if($2==temp){next;}else{print}temp=$2}' temp
pep> AEYTCVAETK         2       genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK            1       genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR      5       genes ADUm.367
pep> VSSILEDKTT         9       genes ADUm.1192,ADUm.2731
pep> AIQLTGK            10      genes ADUm.1999,ADUm.3560
pep> VSSILEDKILSR       3       genes ADUm.2146,ADUm.5750

0 讨论(0)