I have 500 files with name fort.1, fort.2 ... fort.500. Each file contains 800 data as below:
1 0.485
2 0.028
3 0.100
awk without any assumption on the 1st column:
awk '{a[FNR]+=$2;b[FNR]++;}END{for(i=1;i<=FNR;i++)print i,a[i]/b[i];}' fort.*
Assuming the first column is an ID:
cat fort.* | awk '{sum[$1] += $2; counts[$1]++;} END {for (i in sum) print i, sum[i]/counts[i];}'
My understanding: each file is a set of measurements at a particular location. You want to aggregate the measurements across all locations, averaging the value the same row in each file into a new file.
Assuming the first column can be treated as an ID for the row (and there are 800 measurements in a file):
cat fort.* | awk '
BEGIN {
for (i = 1; i <= 800; i++)
total[i] = 0
}
{ total[$1] += $2 }
END {
for (i = 1; i <= 800; i++)
print i, total[i]/500
}
'
First, we initialize an array to store the sum for a row across all files.
Then, we loop through the concatenated files. We use the first column as a key for the row, and we sum into the array.
Finally, we loop over the array and print the average value by row across all files.
Here's a quick way using paste
and awk
:
paste fort.* | awk '{ for(i=2;i<=NF;i+=2) array[$1]+=$i; if (i = NF) print $1, array[$1]/NF*2 }' > output.file
Like some of the other answers; here's another way but this one uses sort
to get numerically sorted output:
awk '{ sum[$1]+=$2; cnt[$1]++ } END { for (i in sum) print i, sum[i]/cnt[i] | "sort -n" }' fort.*