I've tried searching the other threads on this topic but none of the fixes are working for me. I have the results of a natural experiment and I want to show the number of consecutive occurrences of an event fit an exponential distribution. My R shell is pasted below
f x [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 26 27 > y [1] 1880 813 376 161 100 61 31 9 8 2 7 4 3 2 0 [16] 1 0 0 0 0 0 1 0 0 0 0 1 > dat2 x y 1 1 1880 2 2 813 3 3 376 4 4 161 5 5 100 6 6 61 7 7 31 8 8 9 9 9 8 10 10 2 11 11 7 12 12 4 13 13 3 14 14 2 > fm fm fm
Please forgive the bad formatting, first post here. x contains bins of a histogram, y contains the number of occurrences of each bin in that histograms. dat2 cuts off at 14 since the 0 count bins would throw off the exponential regression, and I really only need to fit those first 14. Those bins which have counts beyond 14 I have biological reason to believe they are special. The issue I initially got was infinity, which I don't get since none of the values are 0. After giving decent starting values as suggested by a different post here I get the singular gradient error. The only other posts I saw with that had more variables, I tried increasing the number of iterations but that did not succeed. Any help is appreciated. A
1) linearize to get starting values You need better starting values:
# starting values fm0
giving:
Nonlinear regression model model: y ~ f(x, a, b) data: x a b 4214.4228 -0.8106 residual sum-of-squares: 2388 Number of iterations to convergence: 6 Achieved convergence tolerance: 3.363e-06
1a) Similarly we could use lm
to get the initial value by writing
y ~ a * exp(b * x)
as
y ~ exp(log(a) + b * x)
and taking logs of both to get a model linear in log(a) and b:
log(y) ~ log(a) + b * x
which can be solved using lm
:
fm_lm
giving:
Nonlinear regression model model: y ~ f(x, a, b) data: dat2 a b 4214.423 -0.811 residual sum-of-squares: 2388 Number of iterations to convergence: 6 Achieved convergence tolerance: 3.36e-06
1b) We can also get it to work by reparameterizing. In that case a = 1 and b = 1 will work provided we transform the initial values in line with the parameter transformation.
nls(y ~ exp(loga + b * x), dat2, start = list(loga = log(1), b = 1))
giving:
Nonlinear regression model model: y ~ exp(loga + b * x) data: dat2 loga b 8.346 -0.811 residual sum-of-squares: 2388 Number of iterations to convergence: 20 Achieved convergence tolerance: 3.82e-07
so b is as shown and a = exp(loga) = exp(8.346) = 4213.3
2) plinear Another possibility that is even easier is to use alg="plinear"
in which case starting values are not needed for the parameters entering linearly. In that case the starting value of b=1
in the question seems sufficient.
nls(y ~ exp(b * x), dat2, start = c(b = 1), alg = "plinear")
giving:
Nonlinear regression model model: y ~ exp(b * x) data: dat2 b .lin -0.8106 4214.4234 residual sum-of-squares: 2388 Number of iterations to convergence: 11 Achieved convergence tolerance: 2.153e-06
Please check nlsLM function in minpack.lm package. This is a more robust version of nls and can handle data with zero residual sum of squares.
https://www.r-bloggers.com/a-better-nls/