How to create a search for common fit distribution of two Goodness-to-fit tests list?

别等时光非礼了梦想. 提交于 2020-05-17 05:54:13

问题


I looked into the question Best fit Distribution plots and found out that answers used the Kolmogorov-Smirnov Test to find the best fit distribution. I also found out that there is an Anderson-Darling test that is also used to get the best fit distribution based on given data. So, I have a few questions:

Question 1:

If I want to combine both tests, how can I do that where it searches for the maximum p-value of both tests(find the highest p-value and is common in both tests then I extract the common distribution name with the p-values)? what parameters are the best to use for finding the best fit distribution and rank them (like the photo below) Goodness-to-fit tests with ranking? Here is my attempt in combining both tests.

from statsmodels.stats.diagnostic import anderson_statistic as adtest
    def get_best_distribution(data):
        dist_names = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'burr', 'cauchy', 'chi', 'chi2', 'cosine', 'dgamma', 'dweibull', 'erlang', 'expon', 'exponweib', 'exponpow', 'f', 'fatiguelife', 'fisk', 'foldcauchy', 'foldnorm', 'frechet_r', 'frechet_l', 'genlogistic', 'genpareto', 'genexpon', 'genextreme', 'gausshyper', 'gamma', 'gengamma', 'genhalflogistic', 'gilbrat',  'gompertz', 'gumbel_r', 'gumbel_l', 'halfcauchy', 'halflogistic', 'halfnorm', 'hypsecant', 'invgamma', 'invgauss', 'invweibull', 'johnsonsb', 'johnsonsu', 'ksone', 'kstwobign', 'laplace', 'logistic', 'loggamma', 'loglaplace', 'lognorm', 'lomax', 'maxwell', 'mielke', 'moyal', 'nakagami', 'ncx2', 'ncf', 'nct', 'norm', 'pareto', 'pearson3', 'powerlaw', 'powerlognorm', 'powernorm', 'rdist', 'reciprocal', 'rayleigh', 'rice', 'recipinvgauss', 'semicircular', 't', 'triang', 'truncexpon', 'truncnorm', 'tukeylambda', 'uniform', 'vonmises', 'wald', 'weibull_min', 'weibull_max', 'wrapcauchy']
    dist_ks_results = []
    dist_ad_results = []
    params = {}
    for dist_name in dist_names:
        dist = getattr(st, dist_name)
        param = dist.fit(data)
        params[dist_name] = param

        # Applying the Kolmogorov-Smirnov test
        D_ks, p_ks = st.kstest(data, dist_name, args=param)
        print("Kolmogorov-Smirnov test Statistics value for " + dist_name + " = " + str(D_ks))
        # print("p value for " + dist_name + " = " + str(p_ks))
        dist_ks_results.append((dist_name, p_ks))

        # Applying the Anderson-Darling test:
        D_ad = adtest(x=data, dist=dist, fit=False, params=param)
        print("Anderson-Darling test Statistics value for " + dist_name + " = " + str(D_ad))
        dist_ad_results.append((dist_name, D_ad))

        print(dist_ks_results)
        print(dist_ad_results)

        for D in range (len(dist_ks_results)):
           KS_D = dist_ks_results[D][1]
           AD_D = dist_ad_results[D][1]
           if KS_D < 0.25 and AD_D < 0.05:
                best_ks_D = KS_D
                best_ad_D = AD_D
                if dist_ks_results[D][1] == best_ks_D:
                   best_ks_dist = dist_ks_results[D][0]
                if dist_ad_results[D][1] == best_ad_D:
                   best_ad_dist = dist_ad_results[D][0]

            print(best_ks_D)
            print(best_ad_D)
            print(best_ks_dist)
            print(best_ad_dist)

            print('\n################################ Kolmogorov-Smirnov test parameters #####################################')
            print("Best fitting distribution (KS test): " + str(best_ks_dist))
            print("Best test Statistics value (KS test): " + str(best_ks_D))
            print("Parameters for the best fit (KS test): " + str(params[best_ks_dist])
            print('################################################################################\n')
            print('################################ Anderson-Darling test parameters #########################################')
            print("Best fitting distribution (AD test): " + str(best_ad_dist))
            print("Best test Statistics value (AD test): " + str(best_ad_D))
            print("Parameters for the best fit (AD test): " + str(params[best_ad_dist]))
            print('################################################################################\n')

Edit 1:

I am not sure but is the normal_ad from statsmodel general Anderson-Darling test for any continuous probability distribution?

Edit 2:

Some of the distributions have the same p-values, how can I find the best-fitted distribution if the p-values were the same? Should I look into the test statistics?

Edit 3:

I know its in this line below:

# select the best fitted distribution:
# best_dist, best_p = (max(dist_ks_results, key=lambda item: item[1]))
# best_dist, best_p, best_D = (max(dist_ks_results, key=lambda item: item[1]) and [dist_ks_results, key=lambda item: item[2] if item < 0.05])
best_dist, best_p, best_D = (max(dist_ks_results, key=lambda item: item[1]) and [i[2] for i in dist_ks_results if i[2] < 0.05])
# best_dist, best_p, best_D = (max(dist_ks_results, key=lambda item: item[1]) and min(dist_ks_results, key=lambda item: item[2]))
# best_dist, best_p = (max(dist_ad_results, key=lambda item: item[1]))
# best_dist, best_p, best_D = (max(dist_ks_results, key=lambda item: item[1]) & max(dist_ad_results, key=lambda item: item[1]))
# store the name of the best fit and its p value

But I am not sure if I am doing it right

Question 2:

How can I obtain the p-value for the Anderson-Darling test?

Question 3:

Correct me if I am wrong when implementing the Goodness-to-Fit test, the p-value obtained is used in order to check if the given values fit within any of the mentioned distributions. So, the maximum value of p-value means that the p-value lies below the %5 significant level of which, therefore, for example, Gamma distribution fits the data. Am I right or did I miss understood the main concept of the Goodness-to-Fit test?

来源:https://stackoverflow.com/questions/61645648/how-to-create-a-search-for-common-fit-distribution-of-two-goodness-to-fit-tests

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!