Why is my Kurtosis function not producing the same output as scipy.stats.kurtosis?

问题

I have a homework problem in which I'm supposed to write a function for Kurtosis as descirbed here:

The theta in the denominator is the standard deviation (square-root of the variance) and the x-with-the-bar in the numerator is the mean of x.

I've implemented the function as follows:

import numpy as np
from scipy.stats import kurtosis

testdata = np.array([1, 2, 3, 4, 5])

def mean(obs):
    return (1. / len(obs)) * np.sum(obs)

def variance(obs):
    return (1. / len(obs)) * np.sum((obs - mean(obs)) ** 2)

def kurt(obs):
    num = np.sqrt((1. / len(obs)) * np.sum((obs - mean(obs)) ** 4))
    denom = variance(obs) ** 2  # avoid losing precision with np.sqrt call
    return num / denom

The first two functions, mean and variance were successfully cross-validated with numpy.mean and numpy.var, respectively.

I attempted to cross-validate kurt with the following statement:

>>> kurtosis(testdata) == kurt(testdata)
False

Here's the output of both kurtosis functions:

>>> kurtosis(testdata)  # scipy.stats
-1.3

>>> kurt(testdata)  # my crappy attempt
0.65192024052026476

Where did I go wrong? Is scipy.stats.kurtosis doing something fancier than what's in the equation I've been given?

回答1:

By default, scipy.stats.kurtosis():

Computes excess kurtosis (i.e. subtracts 3 from the result).
Corrects for statistical biases (this affects some of the denominators).

Both behaviours are configurable through optional arguments to scipy.stats.kurtosis().

Finally, the np.sqrt() call in your method is unnecessary since there's no square root in the formula. Once I remove it, the output of your function matches what I get from kurtosis(testdata, False, False).

I attempted to cross-validate kurt with the following statement

You shouldn't be comparing floating-point numbers for exact equality. Even if the mathematical formulae are the same, small differences in how they are translated into computer code could affect the result of the computation.

Finally, if you're going to be writing numerical code, I strongly recommend reading What Every Computer Scientist Should Know About Floating-Point Arithmetic.

P.S. This is the function I've used:

In [51]: def kurt(obs):
   ....:     num = np.sum((obs - mean(obs)) ** 4)/ len(obs)
   ....:     denom = variance(obs) ** 2  # avoid losing precision with np.sqrt call
   ....:     return num / denom

来源：https://stackoverflow.com/questions/13571198/why-is-my-kurtosis-function-not-producing-the-same-output-as-scipy-stats-kurtosi

标签

python

numpy

statistics

scipy