Pearson's Chi Square Test Python

时间秒杀一切 提交于 2019-12-22 00:28:23

问题


I have two arrays that I would like to do a Pearson's Chi Square test (goodness of fit). I want to test whether or not there is a significant difference between the expected and observed results.

observed = [11294, 11830, 10820, 12875]
expected = [10749, 10940, 10271, 11937]

I want to compare 11294 with 10749, 11830 with 10940, 10820 with 10271, etc.

Here's what I have

>>> from scipy.stats import chisquare
>>> chisquare(f_obs=[11294, 11830, 10820, 12875],f_exp=[10749, 10940, 10271, 11937])
(203.08897607453906, 9.0718379533890424e-44)

where 203 is the chi square test statistic and 9.07e-44 is the p value. I'm confused by the results. p-value = 9.07e-44 < 0.05 therefore we reject the null hypothesis and conclude that there is a significant difference between the observed and expected results. This isn't correct because the numbers are so close. How do I fix this?


回答1:


In general, the null hypothesis(H0) says that the two variable(X and Y) are independent, i.e. changing values in X wouldn't affect values in Y.

For example, X = [1,2,3,4] and Y = [2,4,6,8]

If you calculate the "p-value" using any method out there for this case, it should come out to be a very small value, implying that there is a very low chance of this case following the null hypothesis, i.e. a very low chance that X and Y are independent of each other.

It means it will never follow the Null Hypothesis here and these two variables are dependent on each other, in a form of Y = 2X.

In your case also, p-value score of 9.0718379533890424e-44 means the same thing, i.e. small value indicates that there is a very low chance it would suffice the null hypothesis and it means that observed and expected are related to each other and there is no independence between them.

Ps. You are correct about this.



来源:https://stackoverflow.com/questions/29866961/pearsons-chi-square-test-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!