lmer or binomial GLMM

China☆狼群 提交于 2021-02-11 06:48:11

问题


I am running a mixed model in R. However I am having some difficulty understanding the type of model I should be running for the data that I have.

Let's call the dependant variable the number of early button presses in a computerised experiment. An experiment is made up of multiple trials. In each trial a participant has to press a button to react to a target appearing on a screen. However they may press the button too early and this is what is being measured as the outcome variable. So for example, participant A may have in total 3 early button presses in an experiment across trials whereas participant B may have 15.

In a straightforward linear regression model using the lm command in R, I would think this outcome is a continuous numerical variable. As well... its a number that participants score on in the experiment. However I am not trying to run a linear regression, I am trying to run a mixed model with random effects. My understanding of a mixed model in R is that the data format that the model takes from should be structured to show every participant by every trial. When the data is structured like this at trial level suddenly I have a lot of 1s and 0s in my outcome column. As of course at a trial level participants may accidently press the button too early scoring a 1, or not and score a 0.

Does this sound like something that needs to be considered as categorical. If so would it then be looked at through the glmer function with family set to binomial?

Thanks


回答1:


As started by Martin, this question seems to be more of a cross-validation question. But I'll throw in my 2 cents here.

The question often becomes what you're interested in with the experiment, and whether you have cause to believe that there is a random effect in your model. In your example you have 2 possible effects that could be random: The individuals and the trials. In classical random-effect models the random effects are often chosen based on a series of rule-of-thumbs such as

  1. If the parameter can be thought of as random. This often refers to the levels changing within a factor. In this situation both individuals and the trials are likely to change between experiments.
  2. If you're interested in the systematic effect (eg. how much did A affect B) then the effect is not random and should be considered for the fixed effects. In your case, it is really only relevant if there are enough trials to see a systematic effects across individuals, but one could then question how relevant this effect would be for generalized results.

Several other rule-of-thumbs exist out there, but this at least gives us a place to start. The next question becomes which effect we're actually interested in. In your case it is not quite clear, but it sounds like you're interested in one of the following.

  1. How many early button presses can we expect for any given trial
  2. How many early button presses can we expect for any given individual
  3. How big is the chance that an early button press happen during any given trial

For the first 2, you can benefit from averaging over either individual or trial and using a linear mixed effect model with the counter part as random effect. Although I would argue that a poisson generalized linear model is likely a better fit, as you are modelling counts that can only be positive. Eg. in a rather general sense use:

#df is assumed contain raw data
#1)
df_agg <- aggregate(. ~ individual, data = df)
lmer(early_clicks ~ . - individual + (1 |  individual)) #or better: glmer(early_clicks ~ . - individual + (1 | individual), family = poisson, data = df_agg)

#2)
df_agg <- aggregate(. ~ trial, data = df)
lmer(early_clicks ~ . - trial+ (1 |  trial)) #or better: glmer(early_clicks ~ . - trial+ (1 | trial), family = poisson, data = df_agg)

#3)
glmer(early_clicks  ~ . + (1 | trial) + (1 | individual), family = binomial, data = df)

Note that we could use 3) to get answers for 1) and 2) by using 3) to predict probabilities and use these to find the expected early_clicks. However one can show theoretically that the estimation methods used in linear mixed models are exact, while this is not possible for generalized linear models. As such the results may differ slightly (or quite substantially) between all models. Especially in 3) the number of random effects may be quite substantial compared to the number of observations, and in practice may be impossible to estimate.

Disclaimer

I have only very briefly gone over some principals, and while they may be a very brief introduction they are by no means exhaustive. In the last 15 - 20 years the theory and practical side of mixed effect models has been extended substantially. If you'd like more information about mixed effect models I'd suggest starting at the glmm faq side by ben bolker (and others) and the references listed within there. For estimation and implementations I suggest reading the vignettes of the lme4, glmmTMB and possibly merTools packages. glmmTMB being a more recent and interesting project.



来源:https://stackoverflow.com/questions/62469170/lmer-or-binomial-glmm

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!