select last observation from longitudinal data

前端未结
关注
 6  1226
小蘑菇 2020-12-28 09:19
I have a data set with several time assessments for each participant. I want to select the last assessment for each participant. My dataset looks like this:

      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   难免孤独
                                             
                
                
                (楼主)
            
              
              
                2020-12-28 10:02
              

            
            
                        
I've been trying to use split and tapply a bit more to become more acquainted with them.  I know this question have been answered already but I thought I'd add another solotuion using split (pardon the ugliness; I'm more than open to feedback for improvement; thought maybe there was a use to tapply to lessen the code):

sdf <-with(df, split(df, ID))
max.week <- sapply(seq_along(sdf), function(x) which.max(sdf[[x]][, 'week']))
data.frame(t(mapply(function(x, y) y[x, ], max.week, sdf)))


I also figured why we have 7 answers here it was ripe for a benchmark.  The results may surprise you (using rbenchmark with R2.14.1 on a Win 7 machine):

# library(rbenchmark)
# benchmark(
#     DATA.TABLE= {dt <- data.table(df, key="ID")
#         dt[, .SD[which.max(outcome),], by=ID]},
#     DO.CALL={do.call("rbind", 
#         by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week),]))},
#     PLYR=ddply(df, .(ID), function(X) X[which.max(X$week), ]),
#     SPLIT={sdf <-with(df, split(df, ID))
#         max.week <- sapply(seq_along(sdf), function(x) which.max(sdf[[x]][, 'week']))
#         data.frame(t(mapply(function(x, y) y[x, ], max.week, sdf)))},
#     MATCH.INDEX=df[rev(rownames(df)),][match(unique(df$ID), rev(df$ID)), ],
#     AGGREGATE=df[cumsum(aggregate(week ~ ID, df, which.max)$week), ],
#     #WHICH.MAX.INDEX=df[sapply(unique(df$ID), function(x) which.max(x==df$ID)), ],
#     BRYANS.INDEX = df[cumsum(as.numeric(lapply(split(df$week, df$ID), 
#         which.max))), ],
#     SPLIT2={sdf <-with(df, split(df, ID))
#         df[cumsum(sapply(seq_along(sdf), function(x) which.max(sdf[[x]][, 'week']))),
#         ]},
#     TAPPLY=df[tapply(seq_along(df$ID), df$ID, function(x){tail(x,1)}),],
# columns = c( "test", "replications", "elapsed", "relative", "user.self","sys.self"), 
# order = "test", replications = 1000, environment = parent.frame())

          test replications elapsed  relative user.self sys.self
6    AGGREGATE         1000    4.49  7.610169      2.84     0.05
7 BRYANS.INDEX         1000    0.59  1.000000      0.20     0.00
1   DATA.TABLE         1000   20.28 34.372881     11.98     0.00
2      DO.CALL         1000    4.67  7.915254      2.95     0.03
5  MATCH.INDEX         1000    1.07  1.813559      0.51     0.00
3         PLYR         1000   10.61 17.983051      5.07     0.00
4        SPLIT         1000    3.12  5.288136      1.81     0.00
8       SPLIT2         1000    1.56  2.644068      1.28     0.00
9       TAPPLY         1000    1.08  1.830508      0.88     0.00


Edit1:  I omitted the WHICH MAX solution as it does not return the correct results and returned an AGGREGATE solution as well that I wanted to use (compliments of Bryan Goodrich) and an updated version of split, SPLIT2, using cumsum (I liked that move).

Edit 2:  Dason also chimed in with a tapply solution I threw into the test that fared pretty well too.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复