Scipy: lognormal fitting

后端 未结 5 2041
渐次进展
渐次进展 2020-12-25 14:39

There have been quite a few posts on handling the lognorm distribution with Scipy but i still dont get the hang of it.

The 2 parameter lognormal is usua

相关标签:
5条回答
  • 2020-12-25 15:05

    I answered in here

    I leave the code here too just for lazy :D

    import scipy
    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    
    mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example 
    sigma = 1.5 # Standard deviation of sample
    N = 2000 # Number of samples
    
    norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process
    x = norm_dist.rvs(size=N) # Generate samples
    
    # Fit normal
    fitting_params = scipy.stats.norm.fit(x)
    norm_dist_fitted = scipy.stats.norm(*fitting_params)
    t = np.linspace(np.min(x), np.max(x), 100)
    
    # Plot normals
    f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
    sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
    ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r',
            label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std()))
    ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':',
            label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std()))
    ax.legend(loc='lower right')
    plt.show()
    
    
    # The lognormal model fits to a variable whose log is normal
    # We create our variable whose log is normal 'exponenciating' the previous variable
    
    x_exp = np.exp(x)
    mu_exp = np.exp(mu)
    sigma_exp = np.exp(sigma)
    
    fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp)
    lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal)
    t = np.linspace(np.min(x_exp), np.max(x_exp), 100)
    
    # Here is the magic I was looking for a long long time
    lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
    # Plot lognormals
    f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
    sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False,
                 label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})\n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
    ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r',
            label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std()))
    ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':',
            label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std()))
    ax.legend(loc='lower right')
    plt.show()
    

    The trick is to understand these two things:

    1. If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
    2. If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX) with:
      • muX = np.mean(np.log(x))
      • sigmaX = np.std(np.log(x))
    0 讨论(0)
  • 2020-12-25 15:12

    I made the same observations: a free fit of all parameters fails most of the time. You can help by providing a better initial guess, fixing the parameter is not necessary.

    samp = stats.lognorm(0.5,loc=0,scale=1).rvs(size=2000)
    
    # this is where the fit gets it initial guess from
    print stats.lognorm._fitstart(samp)
    
    (1.0, 0.66628696413404565, 0.28031095750445462)
    
    print stats.lognorm.fit(samp)
    # note that the fit failed completely as the parameters did not change at all
    
    (1.0, 0.66628696413404565, 0.28031095750445462)
    
    # fit again with a better initial guess for loc
    print stats.lognorm.fit(samp, loc=0)
    
    (0.50146296628099118, 0.0011019321419653122, 0.99361128537912125)
    

    You can also make up your own function to calculate the initial guess, e.g.:

    def your_func(sample):
        # do some magic here
        return guess
    
    stats.lognorm._fitstart = your_func
    
    0 讨论(0)
  • 2020-12-25 15:12

    If you are just interested in plotting you can use seaborn to get a lognormal distribution.

    import seaborn as sns
    import numpy as np
    from scipy import stats
    
    mu=0
    sigma=1
    n=1000
    
    x=np.random.normal(mu,sigma,n)
    sns.distplot(x, fit=stats.norm) # normal distribution
    
    loc=0
    scale=1
    
    x=np.log(np.random.lognormal(loc,scale,n))
    sns.distplot(x, fit=stats.lognorm) # log normal distribution
    
    0 讨论(0)
  • 2020-12-25 15:19

    I realized my mistakes:

    A) The samples i am drawing need to come from the .rvs method. Like so: sample_dist = sp.stats.lognorm.rvs(3, loc=0, scale=np.exp(10), size=2000)

    B) The fit has some problems. When we fix the loc parameter the fit succeeds much better. param=sp.stats.lognorm.fit(samp, floc=0)

    0 讨论(0)
  • 2020-12-25 15:20

    This problem has been fixed in newer scipy versions. After upgrading scipy0.9 to scipy0.14 the problem dissapears.

    0 讨论(0)
提交回复
热议问题