standard deviation of an arbitrary number of numbers using bc or other standard utilities

…衆ロ難τιáo~ 提交于 2019-12-01 03:33:35

Using :

standardDeviation=$(
    echo "$myNumbers" |
        awk '{sum+=$1; sumsq+=$1*$1}END{print sqrt(sumsq/NR - (sum/NR)**2)}'
)
echo $standardDeviation

Using :

#!/usr/bin/env perl

use strict; use warnings;
use Math::NumberCruncher;

my @data = qw/
    0.556
    1.456
    45.111
    7.812
    5.001
/;

print Math::NumberCruncher::StandardDeviation(\@data);

Output

16.7631

Population standard deviation:

jq -s '(add/length)as$a|map(pow(.-$a;2))|add/length|sqrt'
ruby -e'a=readlines.map(&:to_f);puts (a.map{|x|(x-a.reduce(:+)/a.length)**2}.reduce(:+)/a.length)**0.5'
jq -s '(map(.*.)|add/length)-pow(add/length;2)|sqrt'
awk '{x+=$0;y+=$0^2}END{print sqrt(y/NR-(x/NR)^2)}'

In awk, ^ is in POSIX but ** is not. ** is supported by gawk and nawk but not by mawk.

Sample standard deviation (the first two commands are the same as the first two commands above, but length was replaced with length-1):

jq -s '(add/length)as$a|map(pow(.-$a;2))|add/(length-1)|sqrt'
ruby -e'a=readlines.map(&:to_f);puts (a.map{|x|(x-a.reduce(:+)/a.length)**2}.reduce(:+)/(a.length-1))**0.5'
R -q -e 'sd(scan("stdin"))'

Or use GNU Octave (which can much more than simple std):

standardDeviation="$(echo "${myNumbers}" | octave --eval 'disp(std(scanf("%f")))')"
echo $standardDeviation

Outputs

18.742

Given:

$ myNumbers=$(echo "0.556 1.456 45.111 7.812 5.001" | tr " " "\n")

First decide if you need sample standard deviation vs population standard deviation of those numbers.

Population standard deviation (the function STDEV.P in Excel) requires the entire population of datum. In Excel, text or blanks are skipped.

It is easily calculated on a rolling basis in awk:

$ echo "$myNumbers" | awk '$1+0==$1 {sum+=$1; sumsq+=$1*$1; cnt++}
                           END{print sumsq/cnt; print sqrt(sumsq/cnt - (sum/cnt)**2)}'
16.7631

Or in Ruby:

$ echo "$myNumbers" | ruby -e 'arr=$<.read.split(/\s/).map { |e| Float(e) rescue nil }.compact
                             sumsq=arr.inject(0) { |acc, e| acc+=e*e }
                             p (sumsq/arr.length - (arr.sum/arr.length)**2)**0.5'
16.76307799182477

For a sample standard deviation (the function STDEV.S in Excel and ignoring text or blanks) You need to have the entire sample collected first since the mean is used against each value in the sample.

In awk:

$ echo "$myNumbers" | 
     awk 'function sdev(array) {
     for (i=1; i in array; i++)
        sum+=array[i]
     cnt=i-1
     mean=sum/cnt
     for (i=1; i in array; i++)  
        sqdif+=(array[i]-mean)**2
     return (sqdif/(cnt-1))**0.5
     }
     $1+0==$1 {sum1[++cnt]=$1} 
     END {print sdev(sum1)}' 
18.7417

Or in Ruby:

$ ruby -lane 'BEGIN{col1=[]}
            col1 << Float($F[0]) rescue nil
            END {col1.compact
                 mean=col1.sum / col1.length
                 p (col1.inject(0){ |acc, e| acc+(e-mean)**2 } / 
                        (col1.length-1))**0.5
              }' <(echo "$myNumbers")
18.741690950925424
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!