Why is my Kurtosis function not producing the same output as scipy.stats.kurtosis?

耗尽温柔 提交于 2019-12-13 11:50:40

问题


I have a homework problem in which I'm supposed to write a function for Kurtosis as descirbed here:

The theta in the denominator is the standard deviation (square-root of the variance) and the x-with-the-bar in the numerator is the mean of x.

I've implemented the function as follows:

import numpy as np
from scipy.stats import kurtosis

testdata = np.array([1, 2, 3, 4, 5])

def mean(obs):
    return (1. / len(obs)) * np.sum(obs)

def variance(obs):
    return (1. / len(obs)) * np.sum((obs - mean(obs)) ** 2)

def kurt(obs):
    num = np.sqrt((1. / len(obs)) * np.sum((obs - mean(obs)) ** 4))
    denom = variance(obs) ** 2  # avoid losing precision with np.sqrt call
    return num / denom

The first two functions, mean and variance were successfully cross-validated with numpy.mean and numpy.var, respectively.

I attempted to cross-validate kurt with the following statement:

>>> kurtosis(testdata) == kurt(testdata)
False

Here's the output of both kurtosis functions:

>>> kurtosis(testdata)  # scipy.stats
-1.3

>>> kurt(testdata)  # my crappy attempt
0.65192024052026476

Where did I go wrong? Is scipy.stats.kurtosis doing something fancier than what's in the equation I've been given?


回答1:


By default, scipy.stats.kurtosis():

  1. Computes excess kurtosis (i.e. subtracts 3 from the result).
  2. Corrects for statistical biases (this affects some of the denominators).

Both behaviours are configurable through optional arguments to scipy.stats.kurtosis().

Finally, the np.sqrt() call in your method is unnecessary since there's no square root in the formula. Once I remove it, the output of your function matches what I get from kurtosis(testdata, False, False).

I attempted to cross-validate kurt with the following statement

You shouldn't be comparing floating-point numbers for exact equality. Even if the mathematical formulae are the same, small differences in how they are translated into computer code could affect the result of the computation.

Finally, if you're going to be writing numerical code, I strongly recommend reading What Every Computer Scientist Should Know About Floating-Point Arithmetic.

P.S. This is the function I've used:

In [51]: def kurt(obs):
   ....:     num = np.sum((obs - mean(obs)) ** 4)/ len(obs)
   ....:     denom = variance(obs) ** 2  # avoid losing precision with np.sqrt call
   ....:     return num / denom


来源:https://stackoverflow.com/questions/13571198/why-is-my-kurtosis-function-not-producing-the-same-output-as-scipy-stats-kurtosi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!