问题:

Given a dataframe df with a column called group, how do you randomly sample k groups from it in dplyr? It should return all rows from k groups (given there are at least k unique values in df$group), and every group in df should be equally likely to be returned.

回答1:

Just use sample() to choose some number of groups

iris %>% filter(Species %in% sample(levels(Species),2))

回答2:

Though why you'd want to do this in dplyr makes no sense to me:

library(microbenchmark) microbenchmark(dplyr= iris %>% filter(Species %in% sample(levels(Species),2)),                base= iris[iris[["Species"]] %in% sample(levels(iris[["Species"]]), 2),])  Unit: microseconds   expr     min      lq     mean  median       uq      max neval cld  dplyr 660.287 710.655 753.6704 722.629 771.2860 1122.527   100   b   base  83.629  95.032 110.0936 106.057 119.1715  199.949   100  a

Note [[ is known to be faster than $, although both work

文章来源: Randomly sample groups

标签

iris

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!