Estimate exponential cutoff in a power law distribution

痴心易碎 提交于 2019-12-03 06:36:20

The function scipy.stats.powerlaw.fit may still work for your purposes. It's a bit confusing how the distributions in scipy.stats work (the documentation for each one refers to the optional parameters loc and scale, even though not all of them use these parameters, and each uses them differently). If you look at the docs:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.powerlaw.html

there's also a second non-optional parameter "a", which is the "shape parameters". In the case of powerlaw, this contains a single parameter. Don't worry about "loc" and "scale".

Edit: Sorry, forgot that you wanted the beta parameter too. Your best best may be to define the powerlaw function you want yourself, and then use scipy's generic fitting algorithms to learn the parameters. For example: http://www.scipy.org/Cookbook/FittingData#head-5eba0779a34c07f5a596bbcf99dbc7886eac18e5

Here is a means of estimating the scaling exponent and exponential rate of power law with exponential cut-off by maximizing likelihood in R:

# Input: Data vector, lower threshold
# Output: List, giving type ("powerexp"), scaling exponent, exponential rate, lower threshold, log-likelihood


powerexp.fit <- function(data,threshold=1,method="constrOptim",initial_rate=-1) {
  x <- data[data>=threshold]
  negloglike <- function(theta) {
    -powerexp.loglike(x,threshold,exponent=theta[1],rate=theta[2])
  }
  # Fit a pure power-law distribution
  pure_powerlaw <- pareto.fit(data,threshold)
  # Use this as a first guess at the exponent
  initial_exponent <- pure_powerlaw$exponent
  if (initial_rate < 0) { initial_rate <- exp.fit(data,threshold)$rate }
  minute_rate <- 1e-6
  theta_0 <- as.vector(c(initial_exponent,initial_rate))
  theta_1 <- as.vector(c(initial_exponent,minute_rate))
  switch(method,
    constrOptim = {
      # Impose the constraint that rate >= 0
      # and that exponent >= -1
      ui <- rbind(c(1,0),c(0,1))
      ci <- c(-1,0)
      # Can't start with values on the boundary of the feasible set so add
      # tiny amounts just in case
      if (theta_0[1] == -1) {theta_0[1] <- theta_0[1] + minute_rate}
      if (theta_0[2] == 0) {theta_0[2] <- theta_0[2] + minute_rate}
      est <- constrOptim(theta=theta_0,f=negloglike,grad=NULL,ui=ui,ci=ci)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    optim = {
      est <- optim(par=theta_0,fn=negloglike)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    nlm = {
      est.0 <- nlm(f=negloglike,p=theta_0)
      est.1 <- nlm(f=negloglike,p=theta_1)
      est <- est.0
      if (-est.1$minimum > -est.0$minimum) { est <- est.1;cat("NLM had to switch\n") }
      alpha <- est$estimate[1]
      lambda <- est$estimate[2]
      loglike <- -est$minimum},
    {cat("Unknown method",method,"\n"); alpha<-NA; lambda<-NA; loglike<-NA}
  )
  fit <- list(type="powerexp", exponent=alpha, rate=lambda, xmin=threshold,
              loglike=loglike, samples.over.threshold=length(x))
  return(fit)
}

Check out https://github.com/jeffalstott/powerlaw/ for more info

Powerlaw library can directly be used to estimate the parameters as follows:

  1. Install all the pythons dependencies:

    pip install powerlaw mpmath scipy
    
  2. Run the powerlaw package fit in a python environment:

    import powerlaw
    data = [5, 4, ... ]
    results = powerlaw.Fit(data)
    
  3. get the parameters from the results

    results.truncated_power_law.parameter1 # power law  parameter (alpha)
    results.truncated_power_law.parameter2 # exponential cut-off parameter (beta)
    

I'm also working in the field of networks and I had to do a very similar fit as yours. I found a very easy and fast solution here, and the best part is that is doesn't need installing any packages apart from Scipy (which I'm sure you already have).

The distribution that I wanted to fit is a cut-off power law with a shift, such as the one described in this paper. Using your same notation, my fit is

f(x) = (x + x0)**alpha * exp(-beta*x)

so just add the third parameter x0 to your distribution. Notice that I assume that beta is positive and I just take the sign outside (I think that this makes it clearer that your exponential is decreasing).

The implementation goes like this:

import numpy as np.
import scipy.optimize as opt

def distribution(x, alpha, beta, x0):
    return (x + x0)**alpha * np.exp(-beta *x)

# ... I prepare my data here

fit = opt.curve_fit(distribution, x_data, y_data) # you can pass guess for the parameters/errors
alpha, beta, x0 = fit[0]

This is the result:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!