How to find probability distribution and parameters for real data? (Python 3)

后端 未结 4 442
暖寄归人
暖寄归人 2020-12-02 06:44

I have a dataset from sklearn and I plotted the distribution of the load_diabetes.target data (i.e. the values of the regression that the loa

4条回答
  •  抹茶落季
    2020-12-02 07:07

    Use this approach

    import scipy.stats as st
    def get_best_distribution(data):
        dist_names = ["norm", "exponweib", "weibull_max", "weibull_min", "pareto", "genextreme"]
        dist_results = []
        params = {}
        for dist_name in dist_names:
            dist = getattr(st, dist_name)
            param = dist.fit(data)
    
            params[dist_name] = param
            # Applying the Kolmogorov-Smirnov test
            D, p = st.kstest(data, dist_name, args=param)
            print("p value for "+dist_name+" = "+str(p))
            dist_results.append((dist_name, p))
    
        # select the best fitted distribution
        best_dist, best_p = (max(dist_results, key=lambda item: item[1]))
        # store the name of the best fit and its p value
    
        print("Best fitting distribution: "+str(best_dist))
        print("Best p value: "+ str(best_p))
        print("Parameters for the best fit: "+ str(params[best_dist]))
    
        return best_dist, best_p, params[best_dist]
    

提交回复
热议问题