Different std in pandas vs numpy

前端未结

关注

 2  2043

The standard deviation differs between pandas and numpy. Why and which one is the correct one? (the relative difference is 3.5% which should not come from rounding, this is

相关标签:

2条回答

孤街浪徒

2020-12-08 19:52
In a nutshell, neither is "incorrect". Pandas uses the unbiased estimator (N-1 in the denominator), whereas Numpy by default does not.

To make them behave the same, pass ddof=1 to numpy.std().

For further discussion, see
- Can someone explain biased/unbiased population/sample standard deviation?
- Population variance and sample variance.
- Why divide by n-1?
0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2020-12-08 20:00

For pandas to performed the same as numpy, you can pass in the ddof=0 parameter, so df.std(ddof=0).

This short video explains quite well why n-1 might be preferred for samples. https://www.youtube.com/watch?v=Cn0skMJ2F3c

0 讨论(0)
发布评论:

提交评论
- 加载中...