Python SciPy chisquare test returns different p value from Excel and LibreOffice

空扰寡人 提交于 2019-12-02 02:10:51

问题


After reading a recent blog post about an application of the Poisson distribution, I tried reproducing its findings using Python's 'scipy.stats' module, as well as Excel/LibreOffice 'POISSON' and 'CHITEST' functions.

For the expected values shown in the article, I simply used:

import scipy.stats
for i in range(8):
    print(scipy.stats.poisson.pmf(i, 2)*31)

This reproduces the table shown in the blog post - and I also recreated it from within LibreOffice, using a first column A that has values 0 to 7 in cells A1, A2, ..., A8, and the simple formula '=POISSON(A1, 2, 0)*31' repeated in the first 8 lines of column B.

So far so good - now for the chi-squared p-test value:

Under LibreOffice, I just wrote down the observed values in cells C1-C8, and used '=CHITEST(C1:C8, B1:B8)' to reproduce the article's reported p-value of 0.18. Under scipy.stats however, I can't seem to reproduce this value:

import numpy as np
import scipy.stats

obs = [4, 10, 7, 5, 4, 0, 0, 1]
exp = [scipy.stats.poisson.pmf(i, 2)*31 for i in range(8)]

# we only estimated one variable (the rate of 2 killings per year via 62/31) 
# so dof will be N-1-estimates
estimates = 1
print(scipy.stats.chisquare(np.array(obs), np.array(exp), ddof=len(obs)-1-estimates))
# (10.112318133864241, 0.0014728159441179519)
# the p-test value reported is 0.00147, not 0.18...
#
# Maybe I need to aggregate categories with observations less than 5 
# (as suggested in many textbooks of statistics for chi-squared tests)?
observedAggregateLessThan5 = [14, 7, 5, 5]
expectedAggregateLessThan5 = [exp[0]+exp[1], exp[2], exp[3], sum(exp[4:])]
print(scipy.stats.chisquare(np.array(observedAggregateLessThan5), np.array(expectedAggregateLessThan5), ddof=len(observedAggregateLessThan5)-1-estimates))
# (0.53561749342466913, 0.46425467595930309)
# Again the p-test value computed is not 0.18, it is 0.46...

What am I doing wrong?


回答1:


You are not using the ddof argument correctly. ddof is the change to make to the default degrees of freedom. The default is one less than the length. So you do not have to specify ddof at all:

In [21]: obs
Out[21]: [4, 10, 7, 5, 4, 0, 0, 1]

In [22]: exp
Out[22]: 
[4.1953937803349941,
 8.3907875606699882,
 8.3907875606699882,
 5.5938583737799901,
 2.796929186889995,
 1.1187716747559984,
 0.37292389158533251,
 0.10654968331009501]

In [23]: chisquare(obs, f_exp=array(exp))
Out[23]: (10.112318133864241, 0.1822973566091409)


来源:https://stackoverflow.com/questions/14037939/python-scipy-chisquare-test-returns-different-p-value-from-excel-and-libreoffice

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!