Given a set of random numbers drawn from a continuous univariate distribution, find the distribution

后端 未结 6 2131
情书的邮戳
情书的邮戳 2020-12-04 08:27

Given a set of real numbers drawn from a unknown continuous univariate distribution (let\'s say is is one of beta, Cauchy, chi-square, exponential, F, gamma, Laplace, log-no

6条回答
  •  悲哀的现实
    2020-12-04 09:24

    As others have pointed out, this might be framed as a model selection question. It is a wrong approach to use the distribution that fits the data best without taking into account the complexity of the distribution. This is because the more complicated distribution will generally have better fit, but it will likely overfit the data.

    You can use the Akaike Information Criteria (AIC) to take into account the complexity of the distribution. This is still unsatisfactory as you're only considering a limited number of distributions, but is still better than just using the log likelihood.

    I use just a few distributions, but you can check the documentation to find others that could be relevant

    Using the fitdistrplus you can run:

    library(fitdistrplus)
    
    distributions = c("norm", "lnorm", "exp",
              "cauchy", "gamma", "logis",
              "weibull")
    
    
    # the x vector is defined as in the question
    
    # Plot to see which distributions make sense. This should influence
    # your choice of candidate distributions
    descdist(x, discrete = FALSE, boot = 500)
    
    distr_aic = list()
    distr_fit = list()
    for (distribution in distributions) {
        distr_fit[[distribution]] = fitdist(x, distribution)
        distr_aic[[distribution]] = distr_fit[[distribution]]$aic
    }
    
    > distr_aic
    $norm
    [1] 5032.269
    
    $lnorm
    [1] 5421.815
    
    $exp
    [1] 6602.334
    
    $cauchy
    [1] 5382.643
    
    $gamma
    [1] 5184.17
    
    $logis
    [1] 5047.796
    
    $weibull
    [1] 5058.336
    

    According to our plot and the AIC, it makes sense to use a normal. You can automatize this by just picking the distribution with the minimum AIC. You can check the estimated parameters with

    > distr_fit[['norm']]
    Fitting of the distribution ' norm ' by maximum likelihood 
    Parameters:
         estimate Std. Error
    mean 9.975849 0.09454476
    sd   2.989768 0.06685321
    

提交回复
热议问题