dplyr - Group by and select TOP x %

后端 未结 5 1860
天命终不由人
天命终不由人 2020-12-01 11:31

Using the package dplyr and the function sample_frac it is possible to sample a percentage from every group. What I need is to first sort the elements in every

相关标签:
5条回答
  • 2020-12-01 11:50

    Here's another way

    mtcars %>% 
      select(gear, wt) %>% 
      arrange(gear, desc(wt)) %>% 
      group_by(gear) %>% 
      slice(seq(n()*.2))
    
       gear    wt
      (dbl) (dbl)
    1     3 5.424
    2     3 5.345
    3     3 5.250
    4     4 3.440
    5     4 3.440
    6     5 3.570
    

    I take "top" to mean "having the highest value for wt" and so used desc().

    0 讨论(0)
  • 2020-12-01 11:56

    I believe this gets to the answer you're looking for.

    library(dplyr)
    
    mtcars %>% select(gear, wt) %>% 
      group_by(gear) %>% 
      arrange(gear, wt) %>% 
      filter(row_number() / n() <= .2)
    
    0 讨论(0)
  • 2020-12-01 12:05

    A slight variation using top_n and dplyr:

    mtcars %>% 
     group_by(gear) %>% 
     select(gear, wt) %>% 
     arrange(gear) %>% 
     top_n(seq(n()*.2))
    
      gear    wt
      <dbl> <dbl>
    1     3  5.42
    2     3  5.34
    3     3  5.25
    4     4  3.44
    5     4  3.44
    6     5  3.57
    
    0 讨论(0)
  • 2020-12-01 12:07

    I know this is coming late, but might help someone now. dplyr has a new function top_frac

      library(dplyr)
    mtcars %>%
      select(gear, wt) %>%
      group_by(gear) %>%
      arrange(gear, wt) %>%
      top_frac(n = 0.2,wt = wt)
    

    Here n is the fraction of rows to return and wt is the variable to be used for ordering.

    The output is as below.

    gear wt 3 5.250 3 5.345 3 5.424
    4 3.440 4 3.440 5 3.570

    0 讨论(0)
  • 2020-12-01 12:11

    Or another option with dplyr:

    mtcars %>% select(gear, wt) %>% 
      group_by(gear) %>% 
      arrange(gear, desc(wt)) %>% 
      filter(wt > quantile(wt, .8))
    
    Source: local data frame [7 x 2]
    Groups: gear [3]
    
       gear    wt
      (dbl) (dbl)
    1     3 5.424
    2     3 5.345
    3     3 5.250
    4     4 3.440
    5     4 3.440
    6     4 3.190
    7     5 3.570
    
    0 讨论(0)
提交回复
热议问题