The standard deviation differs between pandas and numpy. Why and which one is the correct one? (the relative difference is 3.5% which should not come from rounding, this is
In a nutshell, neither is "incorrect". Pandas uses the unbiased estimator (N-1
in the denominator), whereas Numpy by default does not.
To make them behave the same, pass ddof=1
to numpy.std().
For further discussion, see
For pandas
to performed the same as numpy
, you can pass in the ddof=0
parameter, so df.std(ddof=0)
.
This short video explains quite well why n-1
might be preferred for samples. https://www.youtube.com/watch?v=Cn0skMJ2F3c