问题
Thanks to @karakfa the below awk array results in the output. I am trying to add $2 to the array and output that as well. $2 is basically the amount of times the unique entry appears. As I am leaaring awk arrays I do not know if my attempt is close.
Input:
chr1:955542-955763 AGRN:exon.1 1 0
chr1:955542-955763 AGRN:exon.1 2 0
chr1:985542-985763 AGRN:exon.2 1 0
chr1:985542-985763 AGRN:exon.2 2 1
My script:
awk '{k=$1 OFS $2;
l=$2; # Is this correct?
s[k]+=$4; c[k]++}
END{for(i in s) # Is this correct?
print i, s[i]/c[i]},
"(lbases)" # Is this correct?' input
Current output:
chr1:955542-955763 AGRN:exon.1 0
chr1:985542-985763 AGRN:exon.2 0.5
Desired output:
chr1:955542-955763 AGRN:exon.1 0 (2 bases)
chr1:985542-985763 AGRN:exon.2 0.5 (2 bases)
回答1:
Your attempt to introduce a new variable is not going to work. You need a count per array key, so the variable should be another array. But in this case, you don't need to add a new array, because the array c already contains the count per key.
awk '{k=$1 OFS $2;
s[k]+=$4; c[k]++}
END{for(i in s)
print i, s[i]/c[i], c[i] " bases" }' input
Notice also how your attempt unhappily had the "bases" outside the closing brace of the END block.
This differs from the problem description in that the key is not $2, but the combination of $1 and $2. If you genuinely need the key to be solely $2, you do need a new array, but then the whole thing will get quite a bit more complex.
来源:https://stackoverflow.com/questions/33023436/awk-array-to-output-the-line-count-as-well-as-average