Getting average per line

自古美人都是妖i 提交于 2019-12-22 07:11:52

问题


I have a large data set in this format

HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87

I want to calculate an average for each line, starting from column 5 until end of line, and ignoring the string NA. Then append the average to the end of each line.

The output would look like this:

HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 0.775
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 0.620

I have been getting the sum like this, but can't figure out how to keep track of the number of integers that were summed, in order to calculate the average.

awk '{x=0;for(i=5;i<=NF;i++)x=x+$i;print $0, x}'

回答1:


$ cat file
HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87
HF TLLM A T NA NA NA NA NA NA NA

$ awk '{sum=cnt=0; for (i=5;i<=NF;i++) if ($i != "NA") { sum+=$i; cnt++ } print $0, (cnt ? sum/cnt : "NA") }' file
HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 0.77525
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 0.6204
HF TLLM A T NA NA NA NA NA NA NA NA

The ternary expression avoids a divide by zero error on input row 3 where every data field is "NA".




回答2:


kent$  awk '{s=n=0;for(i=5;i<=NF;i++)if($i!="NA"){s+=$i*1;n++}printf "%s %.3f\n",$0,s/n}' file
HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 0.775
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 0.620



回答3:


Using awk, you can do this:

awk '{for (i=5;i<=NF;i++) {if ($i!="NA") t++;a+=$i}print $0,a/t;a=t=0}' file
HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 0.77525
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 0.6204


来源:https://stackoverflow.com/questions/19984012/getting-average-per-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!