Calculate median of a fIle with many columns using awk

…衆ロ難τιáo~ 提交于 2020-02-16 08:11:49

问题


I tried to calculate the median (not the mean) for many columns in a file. I wrote this (an adaptation from a code that works for only 1 column).

sort -n <infile | awk '{for (i = 1; i <= NF; ++i); count[NR] = $i;}END {for (i = 1; i <= NF; ++i); if (NR % 2) {print count[(NR + 1) / 2];} else {print (count[(NR / 2)] + count[(NR / 2) + 1]) / 2;}}'

Composite cg00000029 cg00000108 cg00000109 cg00000165
TCGA-G4-6298-11A 0.309164840970903 0.108696904309357
TCGA-G4-6311-11A 0.284214936998384 0.192558185484861
TCGA-AA-3506-11A 0.293174399370542 0.12546425658397
TCGA-AA-3713-11A 0.225964654660289 0.150662194530275


回答1:


Consider using datamash

$ cat input
Composite cg00000029 cg00000108 cg00000109 cg00000165
TCGA-G4-6298-11A 0.309164840970903 0.108696904309357
TCGA-G4-6311-11A 0.284214936998384 0.192558185484861
TCGA-AA-3506-11A 0.293174399370542 0.12546425658397
TCGA-AA-3713-11A 0.225964654660289 0.150662194530275

$ datamash --header-in -W median 2 < input
0.28869466818446

$ datamash --header-in -W median 3 < input
0.13806322555712

See datamash --help for the options used above.



来源:https://stackoverflow.com/questions/59285612/calculate-median-of-a-file-with-many-columns-using-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!