Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

断了今生、忘了曾经 提交于 2019-12-01 18:21:56

问题


As a follow up to this question, I fitted the Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables. MWE is given below:

Type  <- rep(x=LETTERS[1:3], each=5)
Conc  <- rep(x=seq(from=0, to=40, by=10), times=3)
Total <- 50
Kill  <- c(10, 30, 40, 45, 38, 5, 25, 35, 40, 32, 0, 32, 38, 47, 40)

df <- data.frame(Type, Conc, Total, Kill)

fm1 <- 
  glm(
      formula = Kill/Total~Type*Conc
    , family  = binomial(link="logit")
    , data    = df
    , weights = Total
    )

summary(fm1)

Call:
glm(formula = Kill/Total ~ Type * Conc, family = binomial(link = "logit"), 
    data = df, weights = Total)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-4.871  -2.864   1.204   1.706   2.934  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.65518    0.23557  -2.781  0.00541 ** 
TypeB       -0.34686    0.33677  -1.030  0.30302    
TypeC       -0.66230    0.35419  -1.870  0.06149 .  
Conc         0.07163    0.01152   6.218 5.04e-10 ***
TypeB:Conc  -0.01013    0.01554  -0.652  0.51457    
TypeC:Conc   0.03337    0.01788   1.866  0.06201 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 277.092  on 14  degrees of freedom
Residual deviance:  96.201  on  9  degrees of freedom
AIC: 163.24

Number of Fisher Scoring iterations: 5

anova(object=fm1, test="LRT")

Analysis of Deviance Table

Model: binomial, link: logit

Response: Kill/Total

Terms added sequentially (first to last)


          Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
NULL                         14    277.092             
Type       2    6.196        12    270.895  0.04513 *  
Conc       1  167.684        11    103.211  < 2e-16 ***
Type:Conc  2    7.010         9     96.201  0.03005 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


df$Pred <- predict(object=fm1, data=df, type="response")

df1 <- with(data=df,
               expand.grid(Type=levels(Type)
                           , Conc=seq(from=min(Conc), to=max(Conc), length=51)
                           )
      )
df1$Pred <- predict(object=fm1, newdata=df1, type="response")

library(ggplot2)
ggplot(data=df, mapping=aes(x=Conc, y=Kill/Total, color=Type)) + geom_point() +
  geom_line(data=df1, mapping=aes(x=Conc, y=Pred), linetype=2) +
  geom_hline(yintercept=0.5,col="gray")

I want to calculate LD50, LD90 and LD95 with their confidence intervals. As the interaction is significant so I want to calculate LD50, LD90 and LD95 with their confidence intervals for each Type (A, B, and C) separately.



LD stands for lethal dose. It is the amount of substance required to kill X% (LD50 = 50%) of the test population.

Edited Type is a qualitative variable representing different types of drugs and Conc is a quantitative variable representing different Concentrations of drugs.


回答1:


You use the drc package to fit logistic dose-response models.

First fit the model

require(drc)
mod <- drm(Kill/Total ~ Conc, 
           curveid = Type, 
           weights = Total, 
           data = df, 
           fct =  L.4(fixed = c(NA, 0, 1, NA)), 
           type = 'binomial')

Here curveid=specifies the grouping of the data and fct= specifies a 4 parameter logistic function, with parameters for lower and upper bond fixed at 0 and 1.

Note the differences to glm are negligible:

df2 <- with(data=df,
            expand.grid(Conc=seq(from=min(Conc), to=max(Conc), length=51),
                        Type=levels(Type)))
df2$Pred <- predict(object=mod, newdata = df2)

Here's a histgramm of the differences to the glm prediction

hist(df2$Pred - df1$Pred)

Estimate Effective Doses (and CI) from the model

This is easy with the ED() function:

ED(mod, c(50, 90, 95), interval = 'delta')

Estimated effective doses
(Delta method-based confidence interval(s))

     Estimate Std. Error   Lower  Upper
A:50   9.1468     2.3257  4.5885 13.705
A:90  39.8216     4.3444 31.3068 48.336
A:95  50.2532     5.8773 38.7338 61.773
B:50  16.2936     2.2893 11.8067 20.780
B:90  52.0214     6.0556 40.1527 63.890
B:95  64.1714     8.0068 48.4784 79.864
C:50  12.5477     1.5568  9.4963 15.599
C:90  33.4740     2.7863 28.0129 38.935
C:95  40.5904     3.6006 33.5334 47.648

For each group we get ED50, ED90 & ED95 with CI.




回答2:


Your link function of choice (\eta= X\hat\beta) has variance for a new observation (x_0): V_{x_0}=x_0^T(X^TWX)^{-1}x_0

So, for a set of candidate doses, we can predict the expected percentage of deaths using the inverse function:

newdata= data.frame(Type=rep(x=LETTERS[1:3], each=5),
                    Conc=rep(x=seq(from=0, to=40, by=10), times=3))
mm <- model.matrix(fm1, newdata)

# get link on link terms (could also use predict)
eta0 <- apply(mm, 1, function(i) sum(i * coef(fm1)))

# inverse logit function
ilogit <- function(x) return(exp(x) / (1+ exp(x)))

# predicted probs
ilogit(eta0)


# for comfidence intervals we can use a normal approximation
lethal_dose <- function(mod, newdata, alpha) {
  qn <- qnorm(1 - alpha /2)
  mm <- model.matrix(mod, newdata)
  eta0 <- apply(mm, 1, function(i) sum(i * coef(fm1)))
  var_mod <- vcov(mod)

  se <- apply(mm, 1, function(x0, var_mod) {
    sqrt(t(x0) %*% var_mod %*% x0)}, var_mod= var_mod)

  out <- cbind(ilogit(eta0 - qn * se),
               ilogit(eta0),
               ilogit(eta0 + qn * se))
  colnames(out) <- c("LB_CI", "point_est", "UB_CI")

  return(list(newdata=newdata,
              eff_dosage= out))
}

lethal_dose(fm1, newdata, alpha= 0.05)$eff_dosage
$eff_dosage
       LB_CI point_est     UB_CI
1  0.2465905 0.3418240 0.4517820
2  0.4361703 0.5152749 0.5936215
3  0.6168088 0.6851225 0.7462674
4  0.7439073 0.8166343 0.8722545
5  0.8315325 0.9011443 0.9439316
6  0.1863738 0.2685402 0.3704385
7  0.3289003 0.4044270 0.4847691
8  0.4890420 0.5567386 0.6223914
9  0.6199426 0.6990808 0.7679095
10 0.7207340 0.8112133 0.8773662
11 0.1375402 0.2112382 0.3102215
12 0.3518053 0.4335213 0.5190198
13 0.6104540 0.6862145 0.7531978
14 0.7916268 0.8620545 0.9113443
15 0.8962097 0.9469715 0.9736370

Rather than doing this manually, you could also manipulate:

predict.glm(fm1, newdata, se=TRUE)$se.fit



来源:https://stackoverflow.com/questions/36065327/multiple-logistic-regression-with-interaction-between-quantitative-and-qualitati

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!