Compute a confidence interval from sample data

匿名 (未验证) 提交于 2019-12-03 02:29:01

问题:

I have sample data which I would like to compute a confidence interval for, assuming a normal distribution.

I have found and installed the numpy and scipy packages and have gotten numpy to return a mean and standard deviation (numpy.mean(data) with data being a list). Any advice on getting a sample confidence interval would be much appreciated.

回答1:

import numpy as np import scipy as sp import scipy.stats  def mean_confidence_interval(data, confidence=0.95):     a = 1.0*np.array(data)     n = len(a)     m, se = np.mean(a), scipy.stats.sem(a)     h = se * sp.stats.t._ppf((1+confidence)/2., n-1)     return m, m-h, m+h 

you can calculate like this way.



回答2:

Here a shortened version of shasan's code, calculating the 95% confidence interval of the mean of array a:

import numpy as np, scipy.stats as st  st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a)) 

But using StatsModels' tconfint_mean is arguably even nicer:

import statsmodels.stats.api as sms  sms.DescrStatsW(a).tconfint_mean() 

The underlying assumptions for both are that the sample (array a) was drawn independently from a normal distribution with unknown standard deviation (see MathWorld or Wikipedia).

For large sample size n, the sample mean is normally distributed, and one can calculate its confidence interval using st.norm.interval() (as suggested in Jaime's comment). But the above solutions are correct also for small n, where st.norm.interval() gives confidence intervals that are too narrow (i.e., "fake confidence"). See my answer to a similar question for more details (and one of Russ's comments here).

Here an example where the correct options give (essentially) identical confidence intervals:

In [9]: a = range(10,14)  In [10]: mean_confidence_interval(a) Out[10]: (11.5, 9.4457397432391215, 13.554260256760879)  In [11]: st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a)) Out[11]: (9.4457397432391215, 13.554260256760879)  In [12]: sms.DescrStatsW(a).tconfint_mean() Out[12]: (9.4457397432391197, 13.55426025676088) 

And finally, the incorrect result using st.norm.interval():

In [13]: st.norm.interval(0.95, loc=np.mean(a), scale=st.sem(a)) Out[13]: (10.23484868811834, 12.76515131188166) 


回答3:

Start with looking up the z-value for your desired confidence interval from a look-up table. The confidence interval is then mean +/- z*sigma, where sigma is the estimated standard deviation of your sample mean, given by sigma = s / sqrt(n), where s is the standard deviation computed from your sample data and n is your sample size.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!