问题
The file content is as follows:
333379266 834640619 88
333379280 834640621 99
333379280 834640621 66
333376672 857526666 99
333376672 857526666 78
333376672 857526666 62
The first two columns may be duplicate, and I want to output the first two columns and the corresponding max value of the third column.In this case,The result file should be as follows:
333379266 834640619 88
333379280 834640621 99
333376672 857526666 99
My attemp is:
awk '{d[$1" "$2]=$3;if ($3>=d[$1" "$2]){num[$1" "$2]=$3} else{num[$1" "$2]=d[$1" "$2]} }END{for(i in num) print i,num[i]}'
But it does not work,because $3>=d[$1" "$2] is always right , the value of num is always $3, and awk reads the file line by line,so the value of num is always the last one,not the max one.
I'll be appreciated if anyone can give me the solution.Thanks in advance.
回答1:
Could you please try following.
awk '
{
array[$1,$2]=array[$1,$2]>$3?array[$1,$2]:$3
}
END{
for(i in array){
print i,array[i]
}
}
' Input_file
Issues with OP's code:
On your line d[$1" "$2]=$3;if ($3>=d[$1" "$2]); since you are assigning array d's value before comparison to current line's 3rd field so your this condition is always going to be true is what I could see major issue in OP's attempt.
OP's attempt fix: IMHO my solution above should be good but trying to fix OP's attempt here.
awk '{if ($3>=d[$1" "$2]){num[$1" "$2]=$3} else{num[$1" "$2]=d[$1" "$2]};d[$1" "$2]=$3}END{for(i in num) print i,num[i]}' Input_file
回答2:
This one liner applied the same idea as your codes, the only difference is, using FS instead of space.
awk '{k=$1FS$2;a[k]=a[k]>$NF?a[k]:$NF}END{for(i in a)print i,a[i]}' file
来源:https://stackoverflow.com/questions/61487578/how-to-find-out-the-max-value-of-the-third-field-according-to-the-first-two-fiel