Emulate split() with dplyr group_by: return a list of data frames

前端 未结 6 598
忘掉有多难
忘掉有多难 2020-11-29 06:07

I have a large dataset that chokes split() in R. I am able to use dplyr group_by (which is a preferred way anyway) but I am unable to persist the r

6条回答
  •  被撕碎了的回忆
    2020-11-29 06:17

    Comparing the base, plyr and dplyr solutions, it still seems the base one is much faster!

    library(plyr)
    library(dplyr)   
    
    df <- data_frame(Group1=rep(LETTERS, each=1000),
                 Group2=rep(rep(1:10, each=100),26), 
                 Value=rnorm(26*1000))
    
    microbenchmark(Base=df %>%
                 split(list(.$Group2, .$Group1)),
               dplyr=df %>% 
                 group_by(Group1, Group2) %>% 
                 do(data = (.)) %>% 
                 select(data) %>% 
                 lapply(function(x) {(x)}) %>% .[[1]],
               plyr=dlply(df, c("Group1", "Group2"), as.tbl),
               times=50) 
    

    Gives:

    Unit: milliseconds
      expr      min        lq      mean    median        uq       max neval
      Base 12.82725  13.38087  16.21106  14.58810  17.14028  41.67266    50
      dplyr 25.59038 26.66425  29.40503  27.37226  28.85828  77.16062   50
      plyr 99.52911  102.76313 110.18234 106.82786 112.69298 140.97568    50
    

提交回复
热议问题