LC50 / LD50 confidence intervals from multiple regression glm with interaction

I have a quasibinomial glm with two continuous explanatory variables (let's say "LogPesticide" and "LogFood") and an interaction. I would like to calculate the LC50 of the pesticide with confidence intervals at different amounts of food (e. g. the minimum and maximum food value). How can this be achieved?

Example: First I generate a data set.

mydata <- data.frame(
            LogPesticide = rep(log(c(0, 0.1, 0.2, 0.4, 0.8, 1.6) + 0.05), 4),
            LogFood = rep(log(c(1, 2, 4, 8)), each = 6)
          )

set.seed(seed=16) 

growth <- function(x, a = 1, K = 1, r = 1) {            # Logistic growth function. a = position of turning point
  Fx <- (K * exp(r * (x - a))) / (1 + exp(r * (x - a))) # K = carrying capacity
  return(Fx)                                            # r = growth rate (larger r -> narrower curve)
}

y <- rep(NA, length = length(mydata$LogPesticide))
y[mydata$LogFood == log(1)] <- growth(x = mydata$LogPesticide[mydata$LogFood == log(1)], a = log(0.1), K = 1, r = 6)
y[mydata$LogFood == log(2)] <- growth(x = mydata$LogPesticide[mydata$LogFood == log(2)], a = log(0.2), K = 1, r = 4)
y[mydata$LogFood == log(4)] <- growth(x = mydata$LogPesticide[mydata$LogFood == log(4)], a = log(0.4), K = 1, r = 3)
y[mydata$LogFood == log(8)] <- growth(x = mydata$LogPesticide[mydata$LogFood == log(8)], a = log(0.8), K = 1, r = 1)
mydata$Dead <- rbinom(n = length(y), size = 20, prob = y)
mydata$Alive <- 20 - mydata$Dead
mydata$Mortality <- cbind(mydata$Dead, mydata$Alive)

Then I fit the full glm. Model diagnostics are ok and all interaction terms are significant.

model <- glm(Mortality ~ LogPesticide * LogFood, family = quasibinomial, data = mydata)
plot(model)
Anova(model)
summary(model)

I tried to estimate LC50s with dose.p() from the MASS package. If LogFood was a factor, this would work when I re-fit the model as discussed in this post. But with two continuous explanatory variables, you get only 1 intercept, 2 slopes and a difference of slopes (for the interaction).

I can estimate the LC50s using effect(), but don’t know how to get the associated CI for LogPesticide.

# Create a set of fitted values.

library(effects)

food.min <- round(min(model$model$LogFood), 3)
food.max <- round(max(model$model$LogFood), 3)

eff <- effect("LogPesticide:LogFood", model, 
              xlevels = list(LogPesticide = seq(min(model$model$LogPesticide), max(model$model$LogPesticide), length = 100),
                             LogFood = c(food.min, food.max)))

eff2 <- as.data.frame(eff)


# Find fitted values closest to 0.5 when LogFood is minimal and maximal.

fit.min <- which.min(abs(eff2$fit[eff2$LogFood == food.min] - 0.5))
fit.min <- eff2$fit[eff2$LogFood == food.min][fit.min]

fit.max <- which.min(abs(eff2$fit[eff2$LogFood == food.max] - 0.5))
fit.max <- eff2$fit[eff2$LogFood == food.max][fit.max]


# Use those fitted values to predict the LC50s.

lc50.min <- eff2$LogPesticide[eff2$fit == fit.min & eff2$LogFood == food.min]
lc50.max <- eff2$LogPesticide[eff2$fit == fit.max & eff2$LogFood == food.max]


# Plot the results.

plot(fit ~ LogPesticide, data = eff2[eff2$LogFood == round(min(model$model$LogFood), 3),], type = "l")
lines(fit ~ LogPesticide, data = eff2[eff2$LogFood == round(max(model$model$LogFood), 3),], col = "red")
points(y = 0.5, x = lc50.min, pch = 19)
points(y = 0.5, x = lc50.max, pch = 19, col = "red")

From the code of dose.p() I see one must use the vcov matrix. effect() also provides a vcov matrix but I could not modify dose.p() to work with that information correctly. I would be grateful for any ideas!

Replicate data (update: new version of ggplot2 might not like weird data frames with matrices in them??)

mydata <- data.frame(
        LogPesticide = rep(log(c(0, 0.1, 0.2, 0.4, 0.8, 1.6) + 0.05), 4),
        LogFood = rep(log(c(1, 2, 4, 8)), each = 6)
      )
set.seed(seed=16) 

growth <- function(x, a = 1, K = 1, r = 1) {
    ## Logistic growth function. a = position of turning point
    ## K = carrying capacity
    ## r = growth rate (larger r -> narrower curve)
    return((K * exp(r * (x - a))) / (1 + exp(r * (x - a))))
}

rlf <- data.frame(LogFood=log(c(1,2,4,8)),
                              a=log(c(0.1,0.2,0.4,0.8)),
                              r=6,4,3,1)
mydata <- merge(mydata,rlf)
mydata <- plyr::mutate(mydata,
               y=growth(LogPesticide,a,K=1,r),
               Dead=rbinom(n=nrow(mydata),size=20,prob=y),
               N=20,
               Alive=N-Dead,
               pmort=Dead/N)


model <- glm(pmort ~ LogPesticide * LogFood, family = quasibinomial,
          data = mydata, weights=N)

For convenience:

cc <- setNames(coef(model),c("b_int","b_P","b_F","b_PF"))
vv <- vcov(model)
dimnames(vv) <- list(names(cc),names(cc))

Basic prediction data frame:

pframe <- with(mydata,
         expand.grid(LogPesticide=seq(min(LogPesticide),max(LogPesticide),
                      length=51),
                     LogFood=unique(LogFood)))
pframe$pmort <- predict(model,newdata=pframe,type="response")

Now let's break this down. The predicted value at a given level of (log) food F and (log) pesticide P is

logit(surv) = b_int + b_P*P + b_F*F + b_PF*F*P

Thus the logistic curve with respect to pesticide at level F is

logit(surv) = (b_int+b_F*F) + (b_P+b_PF*F)*P

We want to know the value of P for which logit(surv) is 0 (the LC50), so we need

0 = (b_int+b_F*F) + (b_P+b_PF*F)*P50
P50 = -(b_int+b_F*F)/(b_P+b_PF*F)

Translating to code:

P50mean <- function(logF) {
    with(as.list(cc), -(b_int+b_F*logF)/(b_P+b_PF*logF))
}
with(mydata,P50mean(c(min=min(LogFood),max=max(LogFood))))


pLC50 <- data.frame(LogFood=unique(mydata$LogFood))
pLC50 <- transform(pLC50,
               pmort=0.5,
               LogPesticide=P50mean(LogFood))

To get the confidence intervals, the two easiest methods are (1) delta method and (2) posterior prediction intervals (also called 'parametric Bayes' in some contexts). (You could also do nonparametric bootstrapping.)

Delta method

I tried to do this by hand but realized it was getting too hairy (all four coefficients are strongly correlated, and all of these correlations have to be kept track of in the computation -- it's not as easy as the usual formulas where the numerator and denominator are independent values ...)

library("emdbook")
deltavar(-(b_int+b_F*2)/(b_P+b_PF*2),meanval=cc,Sigma=vv)
## have to be a bit fancy here with eval/substitute ...
pLC50$var1 <- sapply(pLC50$LogFood,
            function(logF)
                 eval(substitute(
                     deltavar(-(b_int+b_F*logF)/(b_P+b_PF*logF),
                               meanval=cc,Sigma=vv),
                     list(logF=logF))))

Population prediction intervals

This assumes (slightly more weakly) that the sampling distribution of the parameters is multivariate Normal.

PP <- function(logF,n=1000) {
    b <- MASS::mvrnorm(n,mu=cc,Sigma=vv)
    pred <- with(as.data.frame(b),
         -(b_int+b_F*logF)/(b_P+b_PF*logF))
    return(var(pred))
}
set.seed(101)
pLC50$var2 <- sapply(pLC50$LogFood,PP)

The PPI would actually allow us to relax the assumptions a bit, by getting the quantiles of the distribution of predicted LC50s ... as it turns out (see below) the PPI-based confidence intervals are a bit wider than the Delta method ones, but they're not horribly far apart.

Now plot the whole mess:

library(ggplot2); theme_set(theme_bw())
gg0 <- ggplot(mydata,aes(LogPesticide,pmort,
              colour=factor(LogFood),
              fill = factor(LogFood))) + geom_point() +
       ## individual fits -- a bit ugly
       ##       geom_smooth(method="glm",aes(weight=N),
       ##           method.args=list(family=binomial),alpha=0.1)+
       geom_line(data=pframe,linetype=2)+
       geom_point(data=pLC50,pch=5,size=4)+
       geom_hline(yintercept=0.5,col="gray")

 gg0 + geom_errorbarh(data=pLC50,lwd=2,alpha=0.5,
                       aes(xmin=LogPesticide-1.96*sqrt(var1),
                           xmax=LogPesticide+1.96*sqrt(var1)),
                       height=0)+
       geom_errorbarh(data=pLC50,
                       aes(xmin=LogPesticide-1.96*sqrt(var2),
                           xmax=LogPesticide+1.96*sqrt(var2)),
                      height=0.02)

来源：https://stackoverflow.com/questions/35462144/lc50-ld50-confidence-intervals-from-multiple-regression-glm-with-interaction

标签

glm

logistic-regression