Aggregate sum obs with different ID's in the same data frame

拟墨画扇 提交于 2019-12-20 06:14:24

问题


My goal is to make another column by summing the observation from the present day and all previous observations from the same ID by using the date (the data set is sorted in date and chr nr(ID). I will need the aggregation to start over when a new "id" is presented.

there might be som NA's, they should be considered as null

"Doseringer_pr_kg_dyr" is the observation.

CHR_NR    DATO_AFSLUT    Doseringer_pr_kg_dyr    brugstid 
10358    2018-08-06    29416.67    31   
10358    2018-09-06    104682.27    36 
10358    2018-10-12    10333.33    26   
10358    2018-11-07    10090.91    27   
10358    2018-12-04    8000.00    NA   
13168    2012-01-23    12042.25    2   
13168    2012-01-25    9000.00    42 
13168    2012-03-07    44450.70    19
13168    2012-03-26    35000.00    37 
13168    2012-05-02    93478.26    70

I expect something ala:

CHR_NR    DATO_AFSLUT    Doseringer_pr_kg_dyr    brugstid     sum
10358  2018-11-07    10090.91    27    [108,6]+[109,3]  
10358  2018-12-04    8000.00    NA    [109,6]+[110,3]
13168  2012-01-23    12042.25    2    [111,3]
13168  2012-01-25    9000.00    42    [111,6]+[112,3]
13168  2012-03-07    44450.70    19    [112,6]+[113,3]

where [row, column] + [row, column] is summed in the new column.

i thought of one of the apply functions or a loop like:

agg<-function(dat) {
  last_row <- 0
  for ( row in dat ) {
    if ( row[1] == last_row[1] ) {
      row[6] <- last_row[6] + row[3]
    } else { 
      row[6] <- row[3]
    }
    last_row <- row
  }
}

From the comment:

# dput(head(a)) 
a <- structure(list(CHR_NR = c(10358, 10358, 10358, 10358, 10358, 10358), 
  DATO_AFSLUT = structure(c(15349, 15387, 15426, 15441, 15455, 15476), 
  Level = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real ), 
  Doseringer_pr_kg_dyr = c(276152.688936271, 161616.695196052, 127549.514333096, 13333.3333333333, 86255.3897180763, 31034.1151385928 ), 
  brugstid = c(38, 39, 15, 14, 21, 15), 
  i = c(7267.17602463871, 4144.01782553979, 8503.30095553976, 952.380952380952, 4107.39951038459, 2068.94100923952)), 
  row.names = 6:11, class = "data.frame")

回答1:


a$Doseringer_pr_kg_dyr[is.na(a$Doseringer_pr_kg_dyr)]<-0 a$x<-ave(a$Doseringer_pr_kg_dyr,a$CHR_NR,FUN = cumsum)

where x is the cummulated and "ave" groups the accumulation in CHR



来源:https://stackoverflow.com/questions/56292735/aggregate-sum-obs-with-different-ids-in-the-same-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!