Convert Z-score (Z-value, standard score) to p-value for normal distribution in Python

后端 未结 7 1816
野趣味
野趣味 2020-12-12 18:19

How does one convert a Z-score from the Z-distribution (standard normal distribution, Gaussian distribution) to a p-value? I have yet to find the magical function in Scipy\'

相关标签:
7条回答
  • 2020-12-12 18:31

    For Scipy lovers, Tough this is old question but relevant, and we can have not only normal but other distributions as well so here is solution for few more distributions:

    def get_p_value_normal(z_score: float) -> float:
        """get p value for normal(Gaussian) distribution 
    
        Args:
            z_score (float): z score
    
        Returns:
            float: p value
        """
        return round(norm.sf(z_score), decimal_limit)
    
    
    def get_p_value_t(z_score: float) -> float:
        """get p value for t distribution 
    
        Args:
            z_score (float): z score
    
        Returns:
            float: p value
        """
        return round(t.sf(z_score), decimal_limit)
    
    
    def get_p_value_chi2(z_score: float) -> float:
        """get p value for chi2 distribution 
    
        Args:
            z_score (float): z score
    
        Returns:
            float: p value
        """
        return round(chi2.ppf(z_score, df), decimal_limit)
    
    0 讨论(0)
  • 2020-12-12 18:34

    From formula:

    import numpy as np
    import scipy.special as scsp
    def z2p(z):
        """From z-score return p-value."""
        return 0.5 * (1 + scsp.erf(z / np.sqrt(2)))
    
    0 讨论(0)
  • 2020-12-12 18:37

    I think the cumulative distribution function (cdf) is preferred to the survivor function. The survivor function is defined as 1-cdf, and may communicate improperly the assumptions the language model uses for directional percentiles. Also, the percentage point function (ppf) is the inverse of the cdf, which is very convenient.

    >>> import scipy.stats as st
    >>> st.norm.ppf(.95)
    1.6448536269514722
    >>> st.norm.cdf(1.64)
    0.94949741652589625
    
    0 讨论(0)
  • 2020-12-12 18:39

    Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.

    It can be used to apply the inverse cumulative distribution function (inv_cdf, also known as the quantile function or the percent-point function) and the cumulative distribution function (cdf):

    NormalDist().inv_cdf(0.95)
    # 1.6448536269514715
    NormalDist().cdf(1.64)
    # 0.9494974165258963
    
    0 讨论(0)
  • 2020-12-12 18:42

    I like the survival function (upper tail probability) of the normal distribution a bit better, because the function name is more informative:

    p_values = scipy.stats.norm.sf(abs(z_scores)) #one-sided
    
    p_values = scipy.stats.norm.sf(abs(z_scores))*2 #twosided
    

    normal distribution "norm" is one of around 90 distributions in scipy.stats

    norm.sf also calls the corresponding function in scipy.special as in gotgenes example

    small advantage of survival function, sf: numerical precision should better for quantiles close to 1 than using the cdf

    0 讨论(0)
  • 2020-12-12 18:46

    Aha! I found it: scipy.special.ndtr! This also appears to be under scipy.stats.stats.zprob as well (which is just a pointer to ndtr).

    Specifically, given a one-dimensional numpy.array instance z_scores, one can obtain the p-values as

    p_values = 1 - scipy.special.ndtr(z_scores)
    

    or alternatively

    p_values = scipy.special.ndtr(-z_scores)
    
    0 讨论(0)
提交回复
热议问题