Replace missing values (NA) with most recent non-NA by group

前端 未结 7 1048
南旧
南旧 2020-11-22 05:42

I would like to solve the following problem with dplyr. Preferable with one of the window-functions. I have a data frame with houses and buying prices. The following is an e

7条回答
  •  故里飘歌
    2020-11-22 06:04

    Since data.table v1.12.4, the package has a nafill() funciton, similar to tidyr::fill() or zoo::na.locf() and you can do:

    require(data.table)
    setDT(df)
    
    df[ , price := nafill(price, type = 'locf'), houseID ]
    

    There is also setnafill(), though not allowing for a group by, but multpile columns.

    setnafill(df, type = 'locf', cols = 'price')
    

    Data taken from @G. Grothendieck's answer:

    df = data.frame(houseID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                2L, 3L, 3L, 3L, 3L, 3L),
                    year = c(1995L, 1996L, 1997L, 1998L, 1999L, 1995L, 1996L,
                             1997L, 1998L, 1999L, 1995L, 1996L, 1997L, 1998L, 1999L),
                    price = c(NA, 100L, NA, 120L, NA, NA, NA, NA, 30L, NA, NA, 44L,
                              NA, NA, NA))
    

提交回复
热议问题