How to get last data for each id/date?

前端 未结 3 1803
予麋鹿
予麋鹿 2020-12-11 12:30

I have a data frame that contains id, POSIXct(Date & Time)

> myData

   Tpt_ID    Tpt_DateTime               Value
1  1         2013-01-01 15:17:21 C         


        
相关标签:
3条回答
  • 2020-12-11 13:03

    You can do this pretty easily using data.table syntax...

    #  Load package
    require( data.table )
    
    #  Turn 'data.frame' into 'data.table'
    dt <- data.table( df )
    
    #  Make dates from date/time
    dt[ , Date:= as.Date( Tpt_DateTime ) ]
    
    #  Get last row of each group
    dt[ , .SD[.N] ,  by = c("Tpt_ID" , "Date") ]
    #   Tpt_ID       Date        Tpt_DateTime Value
    #1:      1 2013-01-01 2013-01-01 11:17:21    21
    #2:      2 2013-01-01 2013-01-01 07:18:32     5
    #3:      3 2013-01-01 2013-01-01 08:00:02     1
    #4:      1 2013-01-02 2013-01-02 07:10:11    15
    #5:      2 2013-02-02 2013-02-02 12:18:32     8
    #6:      3 2013-02-03 2013-02-03 14:00:02     3
    
    • First we turn your data-time data into a date with Date := as.Date( Tpt_DateTime )

    • Then we use .SD to get a subset of X's data for each group. .N contains the number of row for each group, so .SD[.N] gives us the last row for each group.

    • Lastly, the by=c("Tpt_ID" , "Date") defines the groups.

    0 讨论(0)
  • 2020-12-11 13:05

    Another data.table solution, which works in a manner similar to @SimonO101's answer -

    library(data.table)
    dt <- data.table( dt )
    dt[ , Date:= as.Date( Tpt_DateTime ) ]
    
    #indexing the data.table
    setkey(dt,Tpt_ID,Date)
    #mult = "last" returns te last row of each unique group as defined by the by argument
    dt[dt[unique(dt), mult="last", which=TRUE]]
    

    This returns -

    > dt[dt[unique(dt),,mult="last", which=TRUE]]
       Tpt_ID        Tpt_DateTime Value       Date
    1:      1 2013-01-01 16:47:21    21 2013-01-01
    2:      1 2013-01-02 12:40:11    15 2013-01-02
    3:      2 2013-01-01 12:48:32     5 2013-01-01
    4:      2 2013-02-02 17:48:32     8 2013-02-02
    5:      3 2013-01-01 13:30:02     1 2013-01-01
    6:      3 2013-02-03 19:30:02     3 2013-02-03
    
    0 讨论(0)
  • 2020-12-11 13:19

    And without data.table...

    x <- structure(list(Tpt_ID = c(1, 2, 3, 1, 2, 3, 1, 2, 3), Tpt_DateTime =    structure(c(1357024641, 1357024712, 1357027202, 1357110611, 1359775112, 1359864002, 1357039041, 1359807512, 1359900002), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10, 5, 1, 15, 6, 2, 21, 8, 3)), .Names = c("Tpt_ID", "Tpt_DateTime", "Value"), row.names = c(NA, 9L), class = "data.frame")
    x$ID_Date <- paste(x$Tpt_ID,as.Date(x$Tpt_DateTime),sep="_")
    f <- function(id, mydf){
         tempdf <- mydf[mydf$ID_Date==id,]
         return(tempdf[which.max(tempdf$Tpt_DateTime),])
    }
    res <- as.data.frame(t(sapply(unique(x$ID_Date), f, mydf=x)))
    rownames(res) <- NULL
    res  
    
    0 讨论(0)
提交回复
热议问题