R - Keep first observation per group identified by multiple variables (Stata equivalent “bys var1 var2 : keep if _n == 1”)

后端 未结 3 621
孤独总比滥情好
孤独总比滥情好 2020-12-15 00:27

So I currently face a problem in R that I exactly know how to deal with in Stata, but have wasted over two hours to accomplish in R.

Using the data.frame below, the

3条回答
  •  鱼传尺愫
    2020-12-15 00:54

    I would order the data.frame at which point you can look into using by:

    mydata <- mydata[with(mydata, do.call(order, list(id, day, value))), ]
    
    do.call(rbind, by(mydata, list(mydata$id, mydata$day), 
                      FUN=function(x) head(x, 1)))
    

    Alternatively, look into the "data.table" package. Continuing with the ordered data.frame from above:

    library(data.table)
    
    DT <- data.table(mydata, key = "id,day")
    DT[, head(.SD, 1), by = key(DT)]
    #     id day value
    #  1:  1   1    10
    #  2:  1   2    15
    #  3:  1   3    20
    #  4:  2   1    40
    #  5:  2   2    30
    #  6:  3   2    22
    #  7:  3   3    24
    #  8:  4   1    11
    #  9:  4   2    11
    # 10:  4   3    12
    

    Or, starting from scratch, you can use data.table in the following way:

    DT <- data.table(id, day, value, key = "id,day")
    DT[, n := rank(value, ties.method="first"), by = key(DT)][n == 1]
    

    And, by extension, in base R:

    Ranks <- with(mydata, ave(value, id, day, FUN = function(x) 
      rank(x, ties.method="first")))
    mydata[Ranks == 1, ]
    

提交回复
热议问题