R subsetting a data frame into multiple data frames based on multiple column values

后端 未结 2 1008
梦谈多话
梦谈多话 2020-12-15 06:06

I am trying to subset a data frame, where I get multiple data frames based on multiple column values. Here is my example

>df
  v1   v2   v3   v4   v5
   A         


        
相关标签:
2条回答
  • 2020-12-15 06:32

    There's now also nest() from tidyr which is rather nice.

    library(tidyr)
    nestdf <- df %>% nest(v3:v5)
    nestdf$data
    
    > nestdf$data
    [[1]]
    # A tibble: 2 × 3
         v3    v4    v5
      <int> <int> <int>
    1     1    10    12
    2     1    10    12
    
    [[2]]
    # A tibble: 1 × 3
         v3    v4    v5
      <int> <int> <int>
    1    10    12     8
    
    [[3]]
    # A tibble: 2 × 3
         v3    v4    v5
      <int> <int> <int>
    1     2    12    15
    2     2    14    16
    

    Access individual tibbles with nestdf$data[1] and so on.

    0 讨论(0)
  • 2020-12-15 06:48

    You are looking for split

    split(df, with(df, interaction(v1,v2)), drop = TRUE)
    $E.X
      v1 v2 v3 v4 v5
    3  E  X  2 12 15
    5  E  X  2 14 16
    
    $D.Y
      v1 v2 v3 v4 v5
    2  D  Y 10 12  8
    
    $A.Z
      v1 v2 v3 v4 v5
    1  A  Z  1 10 12
    

    As noted in the comments

    any of the following would work

    library(microbenchmark)
    microbenchmark(
                    split(df, list(df$v1,df$v2), drop = TRUE), 
                   split(df, interaction(df$v1,df$v2), drop = TRUE),
                   split(df, with(df, interaction(v1,v2)), drop = TRUE))
    
    
    Unit: microseconds
                                                      expr      min        lq    median       uq      max neval
                split(df, list(df$v1, df$v2), drop = TRUE) 1119.845 1129.3750 1145.8815 1182.119 3910.249   100
         split(df, interaction(df$v1, df$v2), drop = TRUE)  893.749  900.5720  909.8035  936.414 3617.038   100
     split(df, with(df, interaction(v1, v2)), drop = TRUE)  895.150  902.5705  909.8505  927.128 1399.284   100
    

    It appears interaction is slightly faster (probably due the fact that the f = list(...) are just converted to an interaction within the function)


    Edit

    If you just want use the subset data.frames then I would suggest using data.table for ease of coding

    library(data.table)
    
    dt <- data.table(df)
    dt[, plot(v4, v5), by = list(v1, v2)]
    
    0 讨论(0)
提交回复
热议问题