Output whole line once for each unique value of a column (Bash)

后端 未结 4 1545
故里飘歌
故里飘歌 2020-12-17 22:06

This must surely be a trivial task with awk or otherwise, but it\'s left me scratching my head this morning. I have a file with a format similar to this:

<
相关标签:
4条回答
  • 2020-12-17 22:39

    Just use sort:

    sort -k 2,2 -u file
    

    The -u removes duplicate entries (as you wanted), and the -k 2,2 makes just the field 2 the sorting field (and so ignores the rest when checking for duplicates).

    0 讨论(0)
  • 2020-12-17 22:45

    One way using awk:

    awk '!array[$2]++' file.txt
    

    Results:

    pep> AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
    pep> AIQLTGK        1   genes ADUm.1999,ADUm.3560
    pep> KHEPPTEVDIEGR  5   genes ADUm.367
    pep> VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
    pep> VSSILEDKILSR   3   genes ADUm.2146,ADUm.5750
    
    0 讨论(0)
  • 2020-12-17 22:50

    I would use Perl for this:

    perl -nae 'print unless exists $seen{$F[1]}; undef $seen{$F[1]}' < input.txt
    

    The n switch works line by line with the input, the a switch splits the line into the @F array.

    0 讨论(0)
  • 2020-12-17 22:51
    awk '{if($2==temp){next;}else{print}temp=$2}' your_file
    

    tested below:

    > awk '{if($2==temp){next;}else{print}temp=$2}' temp
    pep> AEYTCVAETK         2       genes ADUm.1024,ADUm.5198,ADUm.750
    pep> AIQLTGK            1       genes ADUm.1999,ADUm.3560
    pep> KHEPPTEVDIEGR      5       genes ADUm.367
    pep> VSSILEDKTT         9       genes ADUm.1192,ADUm.2731
    pep> AIQLTGK            10      genes ADUm.1999,ADUm.3560
    pep> VSSILEDKILSR       3       genes ADUm.2146,ADUm.5750
    
    0 讨论(0)
提交回复
热议问题