How to get a sigmodal CDF curve use scipy.stats.norm.cdf and matplotlib?

问题

I am trying to plot the S-shape cumulative distribution function (cdf) curve of a normal distribution. However, I ended up with a uniform distribution. What am I doing wrong?

Test Script

import numpy as np
from numpy.random import default_rng
from scipy.stats import norm
import matplotlib.pyplot as plt

siz = 1000
rg = default_rng( 12345 )
a = rg.random(size=siz)
rg = default_rng( 12345 )
b = norm.rvs(size=siz, random_state=rg)
c = norm.cdf(b)

print( 'a = ', a)
print( 'b = ', b)
print( 'c = ', c)

fig, ax = plt.subplots(3, 1)
acount, abins, aignored = ax[0].hist( a, bins=20, histtype='bar', label='a', color='C0' )
bcount, bbins, bignored = ax[1].hist( b, bins=20, histtype='bar', label='b', color='C1' )
ccount, cbins, cignored = ax[2].hist( c, bins=20, histtype='bar', label='c', color='C2' )
print( 'acount, abins, aignored = ', acount, abins, aignored)
print( 'bcount, bbins, bignored = ', bcount, bbins, bignored)
print( 'ccount, cbins, cignored = ', ccount, cbins, cignored)
ax[0].legend()
ax[1].legend()
ax[2].legend()
plt.show()

回答1:

Now I don't know your particular application. But I think the problem lies in that you are creating the values of the cdf for a number of normally distributed random numbers. Below you can see a code example which plots the CDF of a standard normal from -3 to +3

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

x = np.arange(-3, 3, 0.1)
c = norm.cdf(x)

plt.plot(x, c)
plt.show()

CDF of standard normal

回答2:

To plot the sigmoidal result of the CDF of the normally distributed random variates, I should not have used matplotlib's hist() function. Rather, I could have used the bar() function to plot my results.

@Laaggan and @dumbPy answer stated that using regularised and ordered x value is the way to derive the sigmoidal cdf curve. Though commonly done, it isn't applicable when random variates are used. I have compared the solutions of the approach that they had mentioned with what I have done to show that both approaches give the same result. However, my results (see below figure) do show that the usual approach of getting the cdf values goes yield more occurrences of the extreme values of a normal distribution than by using random variates. Excluding the two extremes, occurrences appear uniformly distributed.

I have revised my script and provided comments to demonstrate how I compared the two approaches. I hope my answer can benefit others who are learning to use the rvs(), pdf(), and cdf() functions of the scipy.stats.norm class.

import numpy as np
from numpy.random import default_rng
from scipy.stats import norm
import matplotlib.pyplot as plt

mu = 0
sigma = 1
samples = 1000

rg = default_rng( 12345 )
a = rg.random(size=samples) #Get a  uniform distribution of numbers in the range of 0 to 1.
print( 'a = ', a)

# Get pdf and cdf values using normal random variates. 
rg = default_rng( 12345 ) #Recreate Bit Generator to ensure a same starting point  
b_pdf = norm.rvs( loc=mu, scale=sigma, size=samples, random_state=rg ) #Get pdf of normal distribution(mu=0, sigma=1 gives -3.26 to +3.26).
b_cdf = norm.cdf( b_pdf, loc=mu, scale=sigma ) #get cdf of normal distribution using pdf values (always gives between 0 to 1).
print( 'b_pdf = ', b_pdf)
print( 'b_cdf = ', b_cdf)

#To check b is normally distributed. Using the ordered x (commonly practiced): 
c_x = np.linspace( mu - 3.26*sigma, mu + 3.26*sigma, samples )
c_pdf = norm.pdf( c_x, loc=mu, scale=sigma )
c_cdf = norm.cdf( c_x, loc=mu, scale=sigma  )
print( 'c_x = ', c_x )
print( 'c_pdf = ', c_pdf )
print( 'c_cdf = ', c_cdf )


fig, ax = plt.subplots(3, 1)
bins=np.linspace( 0, 1, num=10 )
acount, abins, aignored = ax[0].hist( a, bins=50, histtype='bar', label='a', color='C0', alpha=0.2, density=True  )
bcount, bbins, bignored = ax[0].hist( b_cdf, bins=50, histtype='bar', label='b_cdf', color='C1', alpha=0.2, density=True )
ccount, cbins, cignored = ax[0].hist( c_cdf, bins=50, histtype='bar', label='c_cdf', color='C2', alpha=0.2, density=True )

bcount, bbins, bignored = ax[1].hist( b_pdf, bins=20, histtype='bar', label='b_pdf', color='C1', alpha=0.4, density=True  )
cpdf_line = ax[1].plot(c_x, c_pdf, label='c_pdf', color='C2')

bpdf_bar = ax[2].bar( b_pdf, b_cdf, label='b_cdf', color='C1', alpha=0.4, width=0.01)
ccdf_line = ax[2].plot(c_x, c_cdf, label='c_cdf', color='C2')
print( 'acount, abins, aignored = ', acount, abins, aignored)
print( 'bcount, bbins, bignored = ', bcount, bbins, bignored)
print( 'ccount, cbins, cignored = ', ccount, cbins, cignored)

ax[0].legend(loc='upper left')
ax[1].legend(loc='upper left')
ax[2].legend(loc='upper left')
plt.show()

回答3:

You are plotting the wrong values. when you do b = norm.rvs(size=siz, random_state=rg), what you get is a 10 independently drawn random samples from standard normal distribution, i.e., z values

hence their histogram is what you see as bell shaped curve.

norm.cdf returns the cfd value at a given z value. If you want cdf's S curve, you can draw uniformly from -3 to 3 z values and get their cdf values at all points . then you plot the output probability values.

EDIT: The other answer gives code for this approach so I won't bother adding again.

来源：https://stackoverflow.com/questions/63217004/how-to-get-a-sigmodal-cdf-curve-use-scipy-stats-norm-cdf-and-matplotlib

标签

python

numpy

matplotlib

scipy