How to put a complicated equation into a R formula?

前端 未结 3 827
谎友^
谎友^ 2020-12-16 19:13

We have the diameter of trees as the predictor and tree height as the dependent variable. A number of different equations exist for this kind of data and we try to model som

3条回答
  •  别那么骄傲
    2020-12-16 19:52

    edit: fixed, no longer incorrectly using offset ...

    An answer that complements @shujaa's:

    You can transform your problem from

    H = 1.3 + D^2/(a+b*D+c*D^2)
    

    to

    1/(H-1.3) = a/D^2+b/D+c
    

    This would normally mess up the assumptions of the model (i.e., if H were normally distributed with constant variance, then 1/(H-1.3) wouldn't be. However, let's try it anyway:

    data(trees)
    df <- transform(trees,
                h=Height * 0.3048,   #transform to metric system
                dbh=Girth * 0.3048 / pi   #transform tree girth to diameter
                )
    lm(1/(h-1.3) ~ poly(I(1/dbh),2,raw=TRUE),data=df)
    
    ## Coefficients:
    ##                    (Intercept)  poly(I(1/dbh), 2, raw = TRUE)1  
    ##                       0.043502                       -0.006136  
    ## poly(I(1/dbh), 2, raw = TRUE)2  
    ##                       0.010792  
    

    These results would normally be good enough to get good starting values for the nls fit. However, you can do better than that via glm, which uses a link function to allow for some forms of non-linearity. Specifically,

    (fit2 <- glm(h-1.3 ~ poly(I(1/dbh),2,raw=TRUE),
                 family=gaussian(link="inverse"),data=df))
    
    ## Coefficients:
    ##                    (Intercept)  poly(I(1/dbh), 2, raw = TRUE)1  
    ##                       0.041795                       -0.002119  
    ## poly(I(1/dbh), 2, raw = TRUE)2  
    ##                       0.008175  
    ## 
    ## Degrees of Freedom: 30 Total (i.e. Null);  28 Residual
    ## Null Deviance:       113.2 
    ## Residual Deviance: 80.05     AIC: 125.4 
    ## 
    

    You can see that the results are approximately the same as the linear fit, but not quite.

    pframe <- data.frame(dbh=seq(0.8,2,length=51))
    

    We use predict, but need to correct the prediction to account for the fact that we subtracted a constant from the LHS:

    pframe$h <- predict(fit2,newdata=pframe,type="response")+1.3
    p2 <- predict(fit2,newdata=pframe,se.fit=TRUE) ## predict on link scale
    pframe$h_lwr <- with(p2,1/(fit+1.96*se.fit))+1.3
    pframe$h_upr <- with(p2,1/(fit-1.96*se.fit))+1.3
    png("dbh_tmp1.png",height=4,width=6,units="in",res=150)
    par(las=1,bty="l")
    plot(h~dbh,data=df)
    with(pframe,lines(dbh,h,col=2))
    with(pframe,polygon(c(dbh,rev(dbh)),c(h_lwr,rev(h_upr)),
          border=NA,col=adjustcolor("black",alpha=0.3)))
    dev.off()
    

    enter image description here

    Because we have used the constant on the LHS (this almost, but doesn't quite, fit into the framework of using an offset -- we could only use an offset if our formula were 1/H - 1.3 = a/D^2 + ..., i.e. if the constant adjustment were on the link (inverse) scale rather than the original scale), this doesn't fit perfectly into ggplot's geom_smooth framework

    library("ggplot2")
    ggplot(df,aes(dbh,h))+geom_point()+theme_bw()+
       geom_line(data=pframe,colour="red")+
       geom_ribbon(data=pframe,colour=NA,alpha=0.3,
                 aes(ymin=h_lwr,ymax=h_upr))
    
    ggsave("dbh_tmp2.png",height=4,width=6)
    

    enter image description here

提交回复
热议问题