Find the maximum values in nth column for each distinct values in 1st column in bash

拟墨画扇 提交于 2020-01-15 07:12:26

问题


I have a 3 column file and I want to find the maximum value of the third column with rows with same first column and have also the second column in output.

Input:

1   234   0.005
1   235   0.060
1   236   0.001
2   234   0.010
2   235   0.003
2   236   0.003
3   234   0.004
3   235   0.100
3   236   0.004

Desired output:

1   235   0.060
2   234   0.010
3   235   0.100

I found this hint from previous questions but I do not know how to have also the second column:

!($1 in max) || $3>max[$1] { max[$1] = $3 }
END {
     PROCINFO["sorted_in"] = "@ind_num_asc"
     for (key in max) {
         print key, max[key]
         }
     }

回答1:


You may use this awk:

awk '!($1 in max) || $3 > max[$1] { max[$1] = $3; two[$1] = $2 }
END { PROCINFO["sorted_in"] = "@ind_num_asc"
   for (i in max) print i, two[i], max[i]
}' file

1 235 0.060
2 234 0.010
3 235 0.100



回答2:


$ sort -k1n -k3nr file | uniq -w 1
1   235   0.060
2   234   0.010
3   235   0.100

Use sort to sort on fields 1 and 3, 3 reversed. Then use uniq and compare only 1st char.

Another one using GNU awk:

$ awk '{
    a[$1][$3]=$0 }
END {   
    PROCINFO["sorted_in"]="@ind_num_asc"       # first for in ascending order
    for(i in a) {
        PROCINFO["sorted_in"]="@ind_num_desc"  # next for in descending
        for(j in a[i]) {
            print a[i][j]
            break
        }
    }
}' file
1   235   0.060
2   234   0.010
3   235   0.100



回答3:


Could you please try following. It should give the output in same order of Input_file's input sequence.

awk '
!a[$1]++{
  b[++count]=$1
}
{
  c[$1]=(c[$1]>$NF?c[$1]:$NF)
  d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2)
}
END{
  for(i=1;i<=count;i++){
    print d[b[i]],c[b[i]]
  }
}'  Input_file

Output will be as follows.

1 235 0.060
2 234 0.010
3 235 0.100

Explanation: Adding explanation for above code too here.

awk '
!a[$1]++{                              ##Checking condition if array named a has NO occurrence of $1 in it then do following.
  b[++count]=$1                        ##Create array b whose index is variable count with increasing value of 1 each time value is $1 for it.
}
{
  c[$1]=(c[$1]>$NF?c[$1]:$NF)          ##Creating array c value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
  d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2)    ##Creating array d value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
}
END{                                   ##Starting end block of awk program here.
  for(i=1;i<=count;i++){               ##Starting for loop here from i value 1 to till value of count.
    print d[b[i]],c[b[i]]              ##Printing value of array d whose index is value of b[i] and array c whose index is b[i].
  }
}' Input_file                          ##mentioning Input_file name here.



回答4:


This should work in any modern awk (not just GNU):

$ awk '!a[$1]||$3>b[$1]{a[$1]=$0;b[$1]=$3} END {for(i in a)print a[i]}' file | sort -n

Broken out for easier reading:

  • !a[$1] || $3>b[$1] - If we haven't seen the first column before OR the third column beats our previous record,
  • {a[$1]=$0;b[$1]=$3} - then store the current line in one array, and the comparison value in another array.
  • END {for(i in a)print a[i]} - Once we've processed all the input, print each line from our storage array.
  • sort -n - sort numerically. Should work with any sort of sort.

Clear as mud?

This solution notably stores entire lines ($0) rather than the content of individual fields, so its output will be the input lines, rather than recreate the input lines. This might be useful if you're comfortable with default field splitting for the sake of collecting fields for comparison, but you have columnated or tabbed input that you want your output to match.




回答5:


$ sort -k1,1n -k3,3nr file | awk '!seen[$1]++'
1   235   0.060
2   234   0.010
3   235   0.100


来源:https://stackoverflow.com/questions/52446221/find-the-maximum-values-in-nth-column-for-each-distinct-values-in-1st-column-in

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!