Select row with most recent date by group

后端 未结 5 1956
南方客
南方客 2020-12-01 14:42

I have a data frame in R where the rows represent events, and one column is the date of the event. The thing the event is happening to is described by an ID column. So for e

5条回答
  •  再見小時候
    2020-12-01 15:35

    For any solution, you might as well correct your date variable first, as shown by @akrun:

    df$date <- as.Date(df$date, '%m/%d/%Y')
    

    Base R

    df[
      tapply(1:nrow(df),df$ID,function(ii) ii[which.max(df$date[ii])])
    ,]
    

    This uses a selection of row numbers to subset the data. You can see the selection by running the middle line (between the []s) on its own.

    Data.table

    Similar to @rawr's:

    require(data.table)
    DT <- data.table(df)
    
    unique(DT[order(date)], by="ID", fromLast=TRUE)
    # or
    unique(DT[order(-date)], by="ID")
    

提交回复
热议问题