R formula - how to write a series with summation in a compact way?

问题

I need to fit a Dunham expansion using R. It means, that I'd like to use nls() function to fit the formula

with Yk,l being the fitted parameters.

Of course I could expand it "by hand" for a certain ranges of k and l, but I'd like to know, if there is any way to write such R-formula in a more elegant way?

I had a look at Creating a summation formula in R, but it seems, that the point of the question is creation of an R-function itself and not a formula for regression functions.

回答1:

You can use expand.grid() to get all combinations of k and l; then build each term as a string; use paste() with the collapse argument to combine the terms into a single string; and coerce into a formula:

dunham_formula <- function(k, l) {
  terms <- with(expand.grid(k = k, l = l), {
    glue::glue("y{k}{l} * (v + .5)^{k} * (J * (J + 1))^{l}")
  })

  as.formula(paste("E ~", paste0(terms, collapse = " + ")))
}

dunham_formula(0:1, 0:1)
#> E ~ y00 * (v + 0.5)^0 * (J * (J + 1))^0 + y10 * (v + 0.5)^1 * 
#>     (J * (J + 1))^0 + y01 * (v + 0.5)^0 * (J * (J + 1))^1 + y11 * 
#>     (v + 0.5)^1 * (J * (J + 1))^1
#> <environment: 0x00000000159e58f8>

Let’s test it with some fake data:

set.seed(42)
n <- 50

df <- data.frame(
  E = rexp(n),
  v = runif(n),
  J = runif(n)
)

summary(nls(dunham_formula(0:1, 0:1), data = df))
#> Warning in nls(dunham_formula(0:1, 0:1), data = df): No starting values specified for some parameters.
#> Initializing 'y00', 'y10', 'y01', 'y11' to '1.'.
#> Consider specifying 'start' or using a selfStart model
#> 
#> Formula: E ~ y00 * (v + 0.5)^0 * (J * (J + 1))^0 + y10 * (v + 0.5)^1 * 
#>     (J * (J + 1))^0 + y01 * (v + 0.5)^0 * (J * (J + 1))^1 + y11 * 
#>     (v + 0.5)^1 * (J * (J + 1))^1
#> 
#> Parameters:
#>     Estimate Std. Error t value Pr(>|t|)
#> y00  0.11636    1.60105   0.073    0.942
#> y10  0.68614    1.56022   0.440    0.662
#> y01  0.31082    1.61183   0.193    0.848
#> y11  0.05498    1.51326   0.036    0.971
#> 
#> Residual standard error: 1.406 on 46 degrees of freedom
#> 
#> Number of iterations to convergence: 1 
#> Achieved convergence tolerance: 1.81e-07

Since this is actually a linear model, you could also instead make a function that returns the basis matrix, and then use lm() to fit the model:

dunham_basis <- function(v, J, k, l) {
  dunham_term <- function(k, l) {
    (v + .5) ^ k * (J * (J + 1)) ^ l
  }

  indices <- expand.grid(k = k, l = l)

  cols <- with(indices, Map(dunham_term, k, l))
  names(cols) <- apply(indices, 1, paste, collapse = ",")

  do.call("cbind", cols)
}

df$Y <- with(df, dunham_basis(v, J, k = 0:1, l = 0:1))

summary(lm(E ~ 0 + Y, data = df))
#> 
#> Call:
#> lm(formula = E ~ 0 + Y, data = df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.5828 -0.7332 -0.3373  0.2736  5.3756 
#> 
#> Coefficients:
#>      Estimate Std. Error t value Pr(>|t|)
#> Y0,0  0.11636    1.60105   0.073    0.942
#> Y1,0  0.68614    1.56022   0.440    0.662
#> Y0,1  0.31082    1.61183   0.193    0.848
#> Y1,1  0.05498    1.51326   0.036    0.971
#> 
#> Residual standard error: 1.406 on 46 degrees of freedom
#> Multiple R-squared:  0.432,  Adjusted R-squared:  0.3826 
#> F-statistic: 8.745 on 4 and 46 DF,  p-value: 2.453e-05

^{Created on 2019-07-02 by the reprex package (v0.3.0)}

来源：https://stackoverflow.com/questions/56846961/r-formula-how-to-write-a-series-with-summation-in-a-compact-way

标签

regression