In repeated measures data, how to subset to select matched cases and controls?

问题

I have a set of data clustered by family, research question is do 2 people in the same family with different characteristic x have the same binary (yes/no) outcome y. In some families, all members are "yes" for y. In other families, some are "yes" and some are "no" for y. I want to get only the families with discordant outcome statuses. I am guessing the code will be some sort of conditional logic statements but can't quite figure it out yet... In the sample data below, for example, I only want to get families 2 and 3. Thank you for your help!

#sample data
df <- as.data.frame(cbind(
famid <- c(1,1,2,2,3,3,3),
individ <- c(1,2,3,4,5,6,7),
y <- c(0,0,0,1,0,0,1)))
colnames(df) <- c("famid", "individ", "y")

回答1:

With base R:

df[ave(df$y, df$famid, FUN = function(x) length(unique(x)) > 1)==1,]

With data.table:

library(data.table)
setDT(df)[, .SD[uniqueN(y)>1], by = famid]
# or:
setDT(df)[, if (uniqueN(y)>1) .SD, by = famid]

With dplyr:

library(dplyr)
df %>% group_by(famid) %>% filter(n_distinct(y) > 1)

来源：https://stackoverflow.com/questions/35525978/in-repeated-measures-data-how-to-subset-to-select-matched-cases-and-controls

标签

subset

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!