Selecting top N rows for each group based on value in column

有些话、适合烂在心里 提交于 2019-12-02 05:06:17

问题


I have dataframe like below :-

x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c("a","a","a", "b","b","c","c","c","c","c","c")
z<-c(2,2,2,1,1,3,3,3,3,3,3)
df<-data.frame(x,y,z)

df
    x y z
1   3 a 2
2   2 a 2
3   1 a 2
4   8 b 1
5   7 b 1
6  11 c 3
7  10 c 3
8   9 c 3
9   7 c 3
10  5 c 3
11  4 c 3

I want to select top n row for each group by column y where n is provided in column z. So the output should be like :

output:
       x   y  z
     1 3   a  2
     2 2   a  2
     3 8   b  1
     4 11  c  3
     5 10  c  3
     6 9   c  3

回答1:


A solution with base R:

# df is split according to y, then we keep only the top "z" value (after ordering x) 
# and rbind everything back together:
do.call(rbind, 
        lapply(split(df, df$y), 
               function(df1) df1[order(df1$x, decreasing=TRUE), ][1:unique(df1$z), ]))
#     x y z
#a.1  3 a 2
#a.2  2 a 2
#b    8 b 1
#c.6 11 c 3
#c.7 10 c 3
#c.8  9 c 3

EDIT:
A much more direct way (still in base R) provided in comment by @mt1022:

df[ave(1:nrow(df), df$y, FUN = seq_along) <= df$z, ]
#   x y z
#1  3 a 2
#2  2 a 2
#4  8 b 1
#6 11 c 3
#7 10 c 3
#8  9 c 3



回答2:


One approach with data.table:

library(data.table)
setDT(df)
df[,.(inc=seq_len(.N)<=z,x,z),by=.(y)][inc==T ,-2]
#   y  x z
#1: a  3 2
#2: a  2 2
#3: b  8 1
#4: c 11 3
#5: c 10 3
#6: c  9 3



回答3:


A solution with dplyr that uses do:

df %>%
   group_by(y) %>%
   do(head(.,as.numeric(unique(.$z))))



回答4:


I'm posting the solution I was looking for using dplyr. It is based on @HNSKD:

library(dplyr)
x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c("a","a","a", "b","b","c","c","c","c","c","c")
z<-c(2,2,2,1,1,3,3,3,3,3,3)

df<-data.frame(x,y,z)

df %>% group_by(y) %>% slice(1:2)

Which returns the first two elements for each y:

# A tibble: 6 x 3
# Groups:   y [3]
      x y         z
  <dbl> <fct> <dbl>
1     3 a         2
2     2 a         2
3     8 b         1
4     7 b         1
5    11 c         3
6    10 c         3


来源:https://stackoverflow.com/questions/45006712/selecting-top-n-rows-for-each-group-based-on-value-in-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!