Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong?

谁都会走 提交于 2019-11-26 09:39:35

问题


A simple example of calculating standard dev:

d <- c(2,4,4,4,5,5,7,9)
sd(d)

yields

[1] 2.13809

but when done by hand, the answer is 2. What am I missing here?


回答1:


Try this

R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8)
[1] 2
R> 

and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed 'by hand' leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote:

The term standard deviation of the sample is used for the uncorrected estimator (using N) while the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, .




回答2:


Looks like R is assuming (n-1) in the denominator, not n.




回答3:


When I want the population variance or standard deviation (n as denominator), I define these two vectorized functions.

  pop.var <- function(x) var(x) * (length(x)-1) / length(x)

  pop.sd <- function(x) sqrt(pop.var(x))

BTW, Khan Academy has a good discussion of population and sample standard deviation here.




回答4:


Note that running the command

?sd 

in R Studio displays the help page for the function. In the details section it states

Like var this uses denominator n - 1.



来源:https://stackoverflow.com/questions/6457755/standard-deviation-in-r-seems-to-be-returning-the-wrong-answer-am-i-doing-some

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!