Subtract previous year's from value from each grouped row in data frame

前端 未结 3 482
遥遥无期
遥遥无期 2020-12-29 15:46

I am trying to calculated the lagged difference (or actual increase) for data that has been inadvertently aggregated. Each successive year in the data includes values from t

3条回答
  •  不思量自难忘°
    2020-12-29 15:51

    1) diff.zoo. With the zoo package its just a matter of converting it to zoo using split= and then performing the diff :

    library(zoo)
    
    zz <- zz0 <- read.zoo(df, split = "id", index = "year", FUN = identity)
    zz[2:3, ] <- diff(zz)
    

    It gives the following (in wide form rather than the long form you mentioned) where each column is an id and each row is a year minus the prior year:

    > zz
       1  2  3  4  5
    1  6  5  2  9  2
    2 10  5 10  7 13
    3  5 16 14 10 14
    

    The wide form shown may actually be preferable but you can convert it to long form if you want that like this:

    dt <- function(x) as.data.frame.table(t(x))
    setNames(cbind(dt(zz), dt(zz0)[3]), c("id", "year", "value", "actual"))
    

    This puts the years in ascending order which is the convention normally used in R.

    2) rollapply. Also using zoo this alternative uses a rolling calculation to add the actual column to your data. It assumes the data is structured as you show with the same number of years in each group arranged in order:

    df$actual <- rollapply(df$value, 6, partial = TRUE, align = "left",
       FUN = function(x) if (length(x) < 6) x[1] else x[1]-x[6])
    

    3) subtraction. Making the same assumptions as in the prior solution we can further simplify it to just this which subtracts from each value the value 5 positions hence:

    transform(df, actual = value - c(tail(value, -5), rep(0, 5)))
    

    or this variation:

    transform(df, actual = replace(value, year > 1, -diff(ts(value), 5)))
    

    EDIT: added rollapply and subtraction solutions.

提交回复
热议问题