问题
I got a .txt file with 2 columns of values. They are 2D coordinates, so the first column represent the x value and the second one is the z value. Unfortunately there are some lines with the same x value but a different z value. I'd like to calculate the average of the z values in order to associate a single z to a single x. A sample of what i have is:
435.212 108.894
435.212 108.897
435.212 108.9
435.212 108.903
As you can see the x value 435.212 is associated with 4 different z value. What i'd like to have is:
435.212 108.8985
where 108.8985 is the result of (108.894+108.897+108.9+108.903)/4. Of course i don't want to modify the other x and z values, so the result would be something like that:
BEFORE:
435.238 108.9
435.25 108.9
435.262 108.9
435.275 108.9
435.212 108.894 <---
435.212 108.897<---
435.212 108.9<---
435.212 108.903<---
AFTER:
435.238 108.9
435.25 108.9
435.262 108.9
435.275 108.9
435.212 108.8985 <---average
The number of z values associated with a single x may vary.
I am using the linux command line and I though to use awk for the job, although any other program/utility i can use on a linux command line could be good.
回答1:
This is one way with awk
:
$ awk '{a[$1]+=$2; ++b[$1]} END {for (i in a) print i, a[i]/b[i]}' file
435.212 108.899
435.25 108.9
435.238 108.9
435.262 108.9
435.275 108.9
Explanation
{a[$1]+=$2; ++b[$1]}
- Store the z values (2nd column) in the array
a
. - Store the amount of elements for each x value (1st column) in the array
b
.
END {for (i in a) print i, a[i]/b[i]}'
- Print the result looping through the values stored in the array.
To have another number format (4 float values for example) you can also use:
printf "%d %.4f\n", i, a[i]/b[i]
来源:https://stackoverflow.com/questions/19357707/calculate-and-print-the-average-value-of-strings-in-a-column