How to fit a data set to an specific function by trial and error or a better specific alternative in R?

Deadly 提交于 2021-01-29 20:23:10

问题


I have a data set and I want to adjust to the following function and find the parameters a and b:

I tried the nonlinear least squares approach, however, I'd like to try by trial and error, using a vector with values for a, and another for b, then plot all the alternatives mixing this values to choose a better fit.

library(readxl)
library(ggplot2)

x <- c(52.67, 46.80, 41.74, 40.45)
y <- c(1.73, 1.84, 1.79, 1.45)

df <- data.frame(x,y)

ggplot(data = df, aes(x, y))+
  geom_point()+
  stat_smooth(method="nls",
              se=FALSE,
              formula = y ~ (a*b*x)/(1+(b*x)),
              method.args = list(start = c(a=2.86, b=0.032)))


回答1:


I wonder if you're a bit mistrustful of the output of nls, thinking that perhaps you could find a better fit yourself?

Here's a way to at least give you a better feel for the fit created by different values of a and b. The idea is that we create a plot with all the values of a on the x axis, and all the values of b on the y axis. For each pair of a and b we work out how close the resulting curve would be to our data (by taking the log sum of squares). If the fit is good, we colour it with a bright colour, and if the fit is bad we colour it with a darker colour. This allows us to see the types of combinations that will make good fits - effectively a heat map of the parameters.

# Our actual data, put in a data frame:
df <- data.frame(x = c(52.67, 46.80, 41.74, 40.45), y = c(1.73, 1.84, 1.79, 1.45))

# Create a grid of all a and b values we want to compare
a <- seq(-5, 10, length.out = 200)
b <- seq(0, 0.5, length.out = 100)
all_mixtures <- setNames(expand.grid(a, b), c("a", "b"))

# Get the sum of squares for each point:
all_mixtures$ss <- apply(all_mixtures, 1, function(i) {
  log(sum((i[1] * i[2] * df$x / (1 + i[2] * df$x) - y)^2))
})

Now we plot the heatmap:

p <- ggplot(all_mixtures, aes(a, b, fill = ss)) +
  geom_tile() + 
  scale_fill_gradientn(colours = c("white", "yellow", "red", "blue")) 
p

Clearly, the optimum pair of a and b lie somewhere on the white line.

Now let's see where the nls thought the best combination of a and b was:

p + geom_point(aes(x= 2.8312323, y = 0.0334379), size = 5)

It looks as though it has found the optimum just at the "bend" of the white line, which is probably what you have guessed.

It looks like if you stray outside this white line, your fit will be worse, and you're not going to find anywhere on the white line that's better.

Trust the nls. Yes, the fit doesn't look very good, but that's simply because the data don't fit this particular formula very well, however you set its parameters. If your model has to be in this form, and these are your data, this is the best fit you are going to get.




回答2:


What constitutes a better bit? Mathematically speaking, the best fit is the one that optimizes a goodness-of-fit metric. Let's obtain parameters a and b that minimize the sum of squares of deviations (the least-squares method):

First, define your metric (least_squares below):

x <- c(52.67, 46.80, 41.74, 40.45)
y <- c(1.73, 1.84, 1.79, 1.45)

y_hat <- function(x, a, b){
  a*b*x/(1 + b*x)
}

least_squares <- function(par, y, x){
  sum((y - y_hat(x, par[1], par[2]))^2)
}

After this, we minimize the metric w.r.t a and b. One can use R machinery for multivariate optimization (e.g., optim) for that:

optim(c(2.86, 0.032), least_squares, y=y, x=x)

which gives optimal values for the parameters:

$par
[1] 2.8312323 0.0334379

Here, c(2.86, 0.032) is an initial guess for parameters' values. You are free to define your own metric (for example, the sum of absolute deviations, weighted sum of least squares, etc.) according to what you need and optimize it. You can play with settings, but it is unlikely that you will arrive at a different result for the same optimization metric given how simple the example is.



来源:https://stackoverflow.com/questions/62395844/how-to-fit-a-data-set-to-an-specific-function-by-trial-and-error-or-a-better-spe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!