问题
Is there some trick that would allow one to use bc (or some other standard utility) to return the standard deviation of an arbitrary number of numbers? For convenience, let's say that the numbers are stored in a Bash variable in the following way:
myNumbers="0.556
1.456
45.111
7.812
5.001"
So, the answer I'm looking for would be in a form such as the following:
standardDeviation="$(echo "${myNumbers}" | <insert magic here>)"
回答1:
Using awk:
standardDeviation=$(
echo "$myNumbers" |
awk '{sum+=$1; sumsq+=$1*$1}END{print sqrt(sumsq/NR - (sum/NR)**2)}'
)
echo $standardDeviation
Using perl :
#!/usr/bin/env perl
use strict; use warnings;
use Math::NumberCruncher;
my @data = qw/
0.556
1.456
45.111
7.812
5.001
/;
print Math::NumberCruncher::StandardDeviation(\@data);
Output
16.7631
回答2:
Population standard deviation:
jq -s '(add/length)as$a|map(pow(.-$a;2))|add/length|sqrt'
ruby -e'a=readlines.map(&:to_f);puts (a.map{|x|(x-a.reduce(:+)/a.length)**2}.reduce(:+)/a.length)**0.5'
jq -s '(map(.*.)|add/length)-pow(add/length;2)|sqrt'
awk '{x+=$0;y+=$0^2}END{print sqrt(y/NR-(x/NR)^2)}'
In awk
, ^
is in POSIX but **
is not. **
is supported by gawk
and nawk
but not by mawk
.
Sample standard deviation (the first two commands are the same as the first two commands above, but length
was replaced with length-1
):
jq -s '(add/length)as$a|map(pow(.-$a;2))|add/(length-1)|sqrt'
ruby -e'a=readlines.map(&:to_f);puts (a.map{|x|(x-a.reduce(:+)/a.length)**2}.reduce(:+)/(a.length-1))**0.5'
R -q -e 'sd(scan("stdin"))'
回答3:
Or use GNU Octave (which can much more than simple std):
standardDeviation="$(echo "${myNumbers}" | octave --eval 'disp(std(scanf("%f")))')"
echo $standardDeviation
Outputs
18.742
回答4:
Given:
$ myNumbers=$(echo "0.556 1.456 45.111 7.812 5.001" | tr " " "\n")
First decide if you need sample standard deviation vs population standard deviation of those numbers.
Population standard deviation (the function STDEV.P in Excel) requires the entire population of datum. In Excel, text or blanks are skipped.
It is easily calculated on a rolling basis in awk
:
$ echo "$myNumbers" | awk '$1+0==$1 {sum+=$1; sumsq+=$1*$1; cnt++}
END{print sumsq/cnt; print sqrt(sumsq/cnt - (sum/cnt)**2)}'
16.7631
Or in Ruby
:
$ echo "$myNumbers" | ruby -e 'arr=$<.read.split(/\s/).map { |e| Float(e) rescue nil }.compact
sumsq=arr.inject(0) { |acc, e| acc+=e*e }
p (sumsq/arr.length - (arr.sum/arr.length)**2)**0.5'
16.76307799182477
For a sample standard deviation (the function STDEV.S in Excel and ignoring text or blanks) You need to have the entire sample collected first since the mean is used against each value in the sample.
In awk
:
$ echo "$myNumbers" |
awk 'function sdev(array) {
for (i=1; i in array; i++)
sum+=array[i]
cnt=i-1
mean=sum/cnt
for (i=1; i in array; i++)
sqdif+=(array[i]-mean)**2
return (sqdif/(cnt-1))**0.5
}
$1+0==$1 {sum1[++cnt]=$1}
END {print sdev(sum1)}'
18.7417
Or in Ruby:
$ ruby -lane 'BEGIN{col1=[]}
col1 << Float($F[0]) rescue nil
END {col1.compact
mean=col1.sum / col1.length
p (col1.inject(0){ |acc, e| acc+(e-mean)**2 } /
(col1.length-1))**0.5
}' <(echo "$myNumbers")
18.741690950925424
来源:https://stackoverflow.com/questions/15101343/standard-deviation-of-an-arbitrary-number-of-numbers-using-bc-or-other-standard