normality test of a distribution in python

孤者浪人 提交于 2019-12-03 05:54:17
David Robinson

This question explains why you're getting such a small p-value. Essentially, normality tests almost always reject the null on very large sample sizes (in yours, for example, you can see just some skew in the left side, which at your enormous sample size is way more than enough).

What would be much more practically useful in your case is to plot a normal curve fit to your data. Then you can see how the normal curve actually differs (for example, you can see whether the tail on the left side does indeed go too long). For example:

from matplotlib import pyplot as plt
import matplotlib.mlab as mlab

n, bins, patches = plt.hist(array, 50, normed=1)
mu = np.mean(array)
sigma = np.std(array)
plt.plot(bins, mlab.normpdf(bins, mu, sigma))

(Note the normed=1 argument: this ensures that the histogram is normalized to have a total area of 1, which makes it comparable to a density like the normal distribution).

Farhad Maleki

In general when the number of samples is less than 50, you should be careful about using tests of normality. Since these tests need enough evidences to reject the null hypothesis, which is "the distribution of the data is normal", and when the number of samples is small they are not able to find those evidences.

Keep in mind that when you fail to reject the null hypothesis it does not mean that the alternative hypothesis is correct.

There is another possibility that: Some implementations of the statistical tests for normality compare the distribution of your data to standard normal distribution. In order to avoid this, I suggest you to standardize the data and then apply the test of normality.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!