Select row with most recent date by group

后端 未结 5 1929
南方客
南方客 2020-12-01 14:42

I have a data frame in R where the rows represent events, and one column is the date of the event. The thing the event is happening to is described by an ID column. So for e

5条回答
  •  無奈伤痛
    2020-12-01 15:15

    It's probably a character flaw but I sometimes resist picking up new packages. The "base R" functions can often do the job. In this case I think the alue of the dplyr package shows through since I stumbled in creating a good solution since the ave function returned a character value for a logical test, which I still don't understand. So I think dplyr is a real gem. And if I could I'd like to insist that any upvotes be preceded by an upvote to akrun's answer. (It's hard to believe this hasn't already been asked and answered on SO.)

    Anyway:

    > df[ as.logical(
            ave(df$date, df$ID, FUN=function(d) as.Date(d , '%m/%d/%Y') == 
                                                 max(as.Date(d, '%m/%d/%Y'))))
          , ]
      ID       date
    2  1 03/14/2001
    6  2 02/01/2008
    7  3 08/22/2011
    

    I thought this should work (fail) :

    > df[ ave(df$date, df$ID, FUN=function(d) as.Date(d , '%m/%d/%Y') ==max(as.Date(d, '%m/%d/%Y'))) , ]
         ID date
    NA   NA 
    NA.1 NA 
    NA.2 NA 
    NA.3 NA 
    NA.4 NA 
    NA.5 NA 
    NA.6 NA 
    NA.7 NA 
    NA.8 NA 
    

    Here's another base R solution that worked the first time with no surprises:

    > do.call( rbind, by(df, df$ID, function(d) d[ which.max(as.Date(d$date, '%m/%d/%Y')), ] ) )
      ID       date
    1  1 03/14/2001
    2  2 02/01/2008
    3  3 08/22/2011
    

    Here's one inspired by @rawr's notion of taking the last one from an ordered subset:

    > do.call( rbind, by(df, df$ID, function(d) tail( d[ order(as.Date(d$date, '%m/%d/%Y')), ] ,1)) )
      ID       date
    1  1 03/14/2001
    2  2 02/01/2008
    3  3 08/22/2011
    

提交回复
热议问题