How does one convert a Z-score from the Z-distribution (standard normal distribution, Gaussian distribution) to a p-value? I have yet to find the magical function in Scipy\'
For Scipy lovers, Tough this is old question but relevant, and we can have not only normal but other distributions as well so here is solution for few more distributions:
def get_p_value_normal(z_score: float) -> float:
"""get p value for normal(Gaussian) distribution
Args:
z_score (float): z score
Returns:
float: p value
"""
return round(norm.sf(z_score), decimal_limit)
def get_p_value_t(z_score: float) -> float:
"""get p value for t distribution
Args:
z_score (float): z score
Returns:
float: p value
"""
return round(t.sf(z_score), decimal_limit)
def get_p_value_chi2(z_score: float) -> float:
"""get p value for chi2 distribution
Args:
z_score (float): z score
Returns:
float: p value
"""
return round(chi2.ppf(z_score, df), decimal_limit)
From formula:
import numpy as np
import scipy.special as scsp
def z2p(z):
"""From z-score return p-value."""
return 0.5 * (1 + scsp.erf(z / np.sqrt(2)))
I think the cumulative distribution function (cdf) is preferred to the survivor function. The survivor function is defined as 1-cdf, and may communicate improperly the assumptions the language model uses for directional percentiles. Also, the percentage point function (ppf) is the inverse of the cdf, which is very convenient.
>>> import scipy.stats as st
>>> st.norm.ppf(.95)
1.6448536269514722
>>> st.norm.cdf(1.64)
0.94949741652589625
Starting Python 3.8
, the standard library provides the NormalDist object as part of the statistics module.
It can be used to apply the inverse cumulative distribution function (inv_cdf, also known as the quantile function or the percent-point function) and the cumulative distribution function (cdf):
NormalDist().inv_cdf(0.95)
# 1.6448536269514715
NormalDist().cdf(1.64)
# 0.9494974165258963
I like the survival function (upper tail probability) of the normal distribution a bit better, because the function name is more informative:
p_values = scipy.stats.norm.sf(abs(z_scores)) #one-sided
p_values = scipy.stats.norm.sf(abs(z_scores))*2 #twosided
normal distribution "norm" is one of around 90 distributions in scipy.stats
norm.sf also calls the corresponding function in scipy.special as in gotgenes example
small advantage of survival function, sf: numerical precision should better for quantiles close to 1 than using the cdf
Aha! I found it: scipy.special.ndtr! This also appears to be under scipy.stats.stats.zprob
as well (which is just a pointer to ndtr
).
Specifically, given a one-dimensional numpy.array
instance z_scores
, one can obtain the p-values as
p_values = 1 - scipy.special.ndtr(z_scores)
or alternatively
p_values = scipy.special.ndtr(-z_scores)