how to cumulatively add values in one vector in R

前端未结

关注

 5  1608

耶瑟儿～ 2020-12-01 18:54

I have a data set that looks like this

id  name    year    job    job2
1   Jane    1980    Worker  0
1   Jane    1981    Manager 1
1   Jane    1982    Manage


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   Happy的楠姐
                                             
                
                
                (楼主)
            
              
              
                2020-12-01 19:58
              

            
            
                        
Contributed by Matthew Dowle:

dt[, .SD[job != "Boss" | year == min(year)][, cumjob := cumsum(job2)],
     by = list(name, job)]


Explanation


Take the dataset
Run a filter and add a column within each Subset of Data (.SD)
Grouped by name and job




Older versions:

You have two different split apply combines here.  One to get the cumulative jobs, and the other to get the first row of boss status.  Here is an implementation in data.table where we basically do each analysis separately (well, kind of), and then collect everything in one place with rbind.  The main thing to note is the by=id piece, which basically means the other expressions are evaluated for each id grouping in the data, which was what you correctly noted was missing from your attempt.

library(data.table)
dt <- as.data.table(df)
dt[, cumujob:=0L]  # add column, set to zero
dt[job2==1, cumujob:=cumsum(job2), by=id]  # cumsum for manager time by person 
rbind(
  dt[job2==1],                     # this is just the manager portion of the data
  dt[job2==0, head(.SD, 1), by=id] # get first bossdom row
)[order(id, year)]                 # order by id, year
#       id name year     job job2 cumujob
#   1:  1 Jane 1980 Manager    1       1
#   2:  1 Jane 1981 Manager    1       2
#   3:  1 Jane 1982 Manager    1       3
#   4:  1 Jane 1983 Manager    1       4
#   5:  1 Jane 1984 Manager    1       5
#   6:  1 Jane 1985 Manager    1       6
#   7:  1 Jane 1986    Boss    0       0
#   8:  2  Bob 1985 Manager    1       1
#   9:  2  Bob 1986 Manager    1       2
#  10:  2  Bob 1987 Manager    1       3
#  11:  2  Bob 1988    Boss    0       0


Note this assumes table is sorted by year within each id, but if it isn't that's easy enough to fix.



Alternatively you could also achieve the same with:

ans <- dt[, .I[job != "Boss" | year == min(year)], by=list(name, job)]
ans <- dt[ans$V1]
ans[, cumujob := cumsum(job2), by=list(name,job)] 


The idea is to basically get the row numbers where the condition matches (with .I - internal variable) and then subset dt on those row numbers (the $v1 part), then just perform the cumulative sum.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复