Selecting top N rows for each group based on value in column

后端 未结 4 410
南方客
南方客 2021-01-15 00:49

I have dataframe like below :-

x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c(\"a\",\"a\",\"a\", \"b\",\"b\",\"c\",\"c\",\"c\",\"c\",\"c\",\"c\")
z<-c(2,2,2,1,1         


        
4条回答
  •  粉色の甜心
    2021-01-15 01:02

    A solution with base R:

    # df is split according to y, then we keep only the top "z" value (after ordering x) 
    # and rbind everything back together:
    do.call(rbind, 
            lapply(split(df, df$y), 
                   function(df1) df1[order(df1$x, decreasing=TRUE), ][1:unique(df1$z), ]))
    #     x y z
    #a.1  3 a 2
    #a.2  2 a 2
    #b    8 b 1
    #c.6 11 c 3
    #c.7 10 c 3
    #c.8  9 c 3
    

    EDIT:
    A much more direct way (still in base R) provided in comment by @mt1022:

    df[ave(1:nrow(df), df$y, FUN = seq_along) <= df$z, ]
    #   x y z
    #1  3 a 2
    #2  2 a 2
    #4  8 b 1
    #6 11 c 3
    #7 10 c 3
    #8  9 c 3
    

提交回复
热议问题