Randomly select groups (and all cases per group) in R?

断了今生、忘了曾经 提交于 2019-12-22 20:46:12

问题


I have an R dataframe with two levels of data: id and year. Within groups defined by id, the years increase (entire dataset has the same (number of) years per group, like so:

id    year    var1    var2
11A   2001    ...     ...
11A   2002    ...     ...
11A   2003    ...     ...
11A   2004    ...     ...
13B   2001    ...     ...
13B   2002    ...     ...
13B   2003    ...     ...
13B   2004    ...     ...
22Z   2001    ...     ...

I have about 20.000 groups in my data, of couse way too many to make nice plots of growth curves. How do I randomly select about 20 of my id's? (so: also select all 4 rows of years corresponding to that id?)


回答1:


This is pretty straight forward if you use sample and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.

dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)), 
   year=as.factor(as.character(sample(c(1990:2012, 20000, T)))), 
   var1=rnorm(20000), var2=rnorm(20000))

#a look at the data
head(dat)

#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))

#narrow your data set
dat2 <- dat[dat$id %in% ids, ]



回答2:


subset(df, id %in% sample(levels(df$id), 20))

that's assuming your data frame is called df and that your id is a factor (use unique instead of levels if it's not)



来源:https://stackoverflow.com/questions/13214769/randomly-select-groups-and-all-cases-per-group-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!