P-value from Chi sq test statistic in Python

后端 未结 7 1726
借酒劲吻你
借酒劲吻你 2020-12-24 01:36

I have computed a test statistic that is distributed as a chi square with 1 degree of freedom, and want to find out what P-value this corresponds to using python.

I

相关标签:
7条回答
  • 2020-12-24 01:47

    For ultra-high precision, when scipy's chi2.sf() isn't enough, bring out the big guns:

    >>> import numpy as np
    >>> from rpy2.robjects import r
    >>> np.exp(np.longdouble(r.pchisq(19000, 2, lower_tail=False, log_p=True)[0]))
    1.5937563168532229629e-4126
    
    0 讨论(0)
  • 2020-12-24 01:50

    If you want to understand the math, the p-value of a sample, x (fixed), is

    P[P(X) <= P(x)] = P[m(X) >= m(x)] = 1 - G(m(x)^2)

    where,

    • P is the probability of a (say k-variate) normal distribution w/ known covariance (cov) and mean,
    • X is a random variable from that normal distribution,
    • m(x) is the mahalanobis distance = sqrt( < cov^{-1} (x-mean), x-mean >. Note that in 1-d this is just the absolute value of the z-score.
    • G is the CDF of the chi^2 distribution w/ k degrees of freedom.

    So if you're computing the p-value of a fixed observation, x, then you compute m(x) (generalized z-score), and 1-G(m(x)^2).

    for example, it's well known that if x is sampled from a univariate (k = 1) normal distribution and has z-score = 2 (it's 2 standard deviations from the mean), then the p-value is about .046 (see a z-score table)

    In [7]: from scipy.stats import chi2
    
    In [8]: k = 1
    
    In [9]: z = 2
    
    In [10]: 1-chi2.cdf(z**2, k)
    Out[10]: 0.045500263896358528
    
    0 讨论(0)
  • 2020-12-24 02:06

    You meant to do:

    >>> 1 - stats.chi2.cdf(3.84, 1)
    0.050043521248705147
    
    0 讨论(0)
  • 2020-12-24 02:08

    Quick refresher here:

    Probability Density Function: think of it as a point value; how dense is the probability at a given point?

    Cumulative Distribution Function: this is the mass of probability of the function up to a given point; what percentage of the distribution lies on one side of this point?

    In your case, you took the PDF, for which you got the correct answer. If you try 1 - CDF:

    >>> 1 - stats.chi2.cdf(3.84, 1)
    0.050043521248705147
    

    PDF CDF

    0 讨论(0)
  • 2020-12-24 02:08

    To calculate probability of null hypothesis given chisquared sum, and degrees of freedom you can also call chisqprob:

    >>> from scipy.stats import chisqprob
    >>> chisqprob(3.84, 1)
    0.050043521248705189
    

    Notice:

    chisqprob is deprecated! stats.chisqprob is deprecated in scipy 0.17.0; use stats.distributions.chi2.sf instead

    0 讨论(0)
  • 2020-12-24 02:08

    Update: as noted, chisqprob() is deprecated for scipy version 0.17.0 onwards. High accuracy chi-square values can now be obtained via scipy.stats.distributions.chi2.sf(), for example:

    >>>from scipy.stats.distributions import chi2
    >>>chi2.sf(3.84,1)
    0.050043521248705189
    >>>chi2.sf(1424,1)
    1.2799986253099803e-311
    

    While stats.chisqprob() and 1-stats.chi2.cdf() appear comparable for small chi-square values, for large chi-square values the former is preferable. The latter cannot provide a p-value smaller than machine epsilon,and will give very inaccurate answers close to machine epsilon. As shown by others, comparable values result for small chi-squared values with the two methods:

    >>>from scipy.stats import chisqprob, chi2
    >>>chisqprob(3.84,1)
    0.050043521248705189
    >>>1 - chi2.cdf(3.84,1)
    0.050043521248705147
    

    Using 1-chi2.cdf() breaks down here:

    >>>1 - chi2.cdf(67,1)
    2.2204460492503131e-16
    >>>1 - chi2.cdf(68,1)
    1.1102230246251565e-16
    >>>1 - chi2.cdf(69,1)
    1.1102230246251565e-16
    >>>1 - chi2.cdf(70,1)
    0.0
    

    Whereas chisqprob() gives you accurate results for a much larger range of chi-square values, producing p-values nearly as small as the smallest float greater than zero, until it too underflows:

    >>>chisqprob(67,1)
    2.7150713219425247e-16
    >>>chisqprob(68,1)
    1.6349553217245471e-16
    >>>chisqprob(69,1)
    9.8463440314253303e-17    
    >>>chisqprob(70,1)
    5.9304458500824782e-17
    >>>chisqprob(500,1)
    9.505397766554137e-111
    >>>chisqprob(1000,1)
    1.7958327848007363e-219
    >>>chisqprob(1424,1)
    1.2799986253099803e-311
    >>>chisqprob(1425,1)
    0.0
    
    0 讨论(0)
提交回复
热议问题