This must surely be a trivial task with awk or otherwise, but it\'s left me scratching my head this morning. I have a file with a format similar to this:
awk
One way using awk:
awk '!array[$2]++' file.txt
Results:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750 pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560 pep> KHEPPTEVDIEGR 5 genes ADUm.367 pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731 pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750