Emulate split() with dplyr group_by: return a list of data frames

前端未结

关注

 6  598

忘掉有多难 2020-11-29 06:07

I have a large dataset that chokes split() in R. I am able to use dplyr group_by (which is a preferred way anyway) but I am unable to persist the r

6条回答

被撕碎了的回忆 (楼主)

2020-11-29 06:17

Comparing the base, plyr and dplyr solutions, it still seems the base one is much faster!

library(plyr)
library(dplyr)   

df <- data_frame(Group1=rep(LETTERS, each=1000),
             Group2=rep(rep(1:10, each=100),26), 
             Value=rnorm(26*1000))

microbenchmark(Base=df %>%
             split(list(.$Group2, .$Group1)),
           dplyr=df %>% 
             group_by(Group1, Group2) %>% 
             do(data = (.)) %>% 
             select(data) %>% 
             lapply(function(x) {(x)}) %>% .[[1]],
           plyr=dlply(df, c("Group1", "Group2"), as.tbl),
           times=50)

Gives:

Unit: milliseconds
  expr      min        lq      mean    median        uq       max neval
  Base 12.82725  13.38087  16.21106  14.58810  17.14028  41.67266    50
  dplyr 25.59038 26.66425  29.40503  27.37226  28.85828  77.16062   50
  plyr 99.52911  102.76313 110.18234 106.82786 112.69298 140.97568    50

0 讨论(0)

查看其它6个回答