Fill NA in a time series only to a limited number

前端 未结 5 997
滥情空心
滥情空心 2020-12-01 17:08

Is there a way we can fill NAs in a zoo or xts object with limited number of NAs forward. In other words like fill

相关标签:
5条回答
  • 2020-12-01 17:30

    Without using na.locf, but the idea is to split your xts by group of non missing values, then for each group replacing only the 3 first values (after the non misssing one) with the first value. It is a loop , but since it is only applied on group , it should be faster than a simple loop over all the values.

    zz <- 
    unlist(sapply(split(coredata(x),cumsum(!is.na(x))),
           function(sx){
             if(length(sx)>3) 
               sx[2:4] <- rep(sx[1],3)
             else sx <- rep(sx[1],length(sx))
             sx
           }))
    ## create the zoo object since , the latter algorithm is applied only to the values 
    zoo(zz,index(x))
    
    2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-09-26 2014-09-27 2014-09-28 2014-09-29 2014-09-30 2014-10-01 2014-10-02 
             1          1          1          1          5          5          5          5         NA         NA         11         12         12 
    2014-10-03 2014-10-04 2014-10-05 2014-10-06 2014-10-07 2014-10-08 2014-10-09 
            12         12         NA         NA         NA         19         20 
    
    0 讨论(0)
  • 2020-12-01 17:44

    Here's another way:

    l <- cumsum(! is.na(x))
    c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
    # [1]  1  1  1  1  5  5  5  5 NA NA 11 12 12 12 12 NA NA NA 19 20
    

    edit: my previous answer required that x have no duplicates. The current answer does not.

    benchmarks

    x <- rep(x, length.out=1e4)
    
    plourde <- function(x) {
        l <- cumsum(! is.na(x))
        c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
    }
    
    agstudy <- function(x) {
        unlist(sapply(split(coredata(x),cumsum(!is.na(x))),
               function(sx){
                 if(length(sx)>3) 
                   sx[2:4] <- rep(sx[1],3)
                 else sx <- rep(sx[1],length(sx))
                 sx
               }))
    }
    
    microbenchmark(plourde(x), agstudy(x))
    # Unit: milliseconds
    #        expr   min     lq median     uq   max neval
    #  plourde(x)  5.30  5.591  6.409  6.774 57.13   100
    #  agstudy(x) 16.04 16.249 16.454 17.516 20.64   100
    
    0 讨论(0)
  • 2020-12-01 17:45

    And another idea that, unless I've missed something, seems valid:

    na_locf_until = function(x, n = 3)
    {
       wnn = which(!is.na(x))  
       inds = sort(c(wnn, (wnn + n+1)[which((wnn + n+1) < c(wnn[-1], length(x)))]))
       c(rep(NA, wnn[1] - 1), 
         as.vector(x)[rep(inds, c(diff(inds), length(x) - inds[length(inds)] + 1))])
    }
    na_locf_until(x)
    #[1]  1  1  1  1  5  5  5  5 NA NA 11 12 12 12 12 NA NA NA 19 20
    
    0 讨论(0)
  • 2020-12-01 17:50

    From playing around in data.table comes this hacky solution:

    np1 <- 3 + 1
    dt[, 
       x_filled := x[c(rep(1, min(np1, .N)), rep(NA, max(0, .N - np1)))],
       by = cumsum(!is.na(x))]
    # Or slightly simplified:
    dt[, 
       x_filled := ifelse(rowid(x) < 4, x[1], x[NA]),
       by = cumsum(!is.na(x))]
    
    > dt
              date  x x_filled
     1: 2019-02-14  1        1
     2: 2019-02-15 NA        1
     3: 2019-02-16 NA        1
     4: 2019-02-17 NA        1
     5: 2019-02-18  5        5
     6: 2019-02-19 NA        5
     7: 2019-02-20 NA        5
     8: 2019-02-21 NA        5
     9: 2019-02-22 NA       NA
    10: 2019-02-23 NA       NA
    11: 2019-02-24 11       11
    12: 2019-02-25 12       12
    13: 2019-02-26 NA       12
    14: 2019-02-27 NA       12
    15: 2019-02-28 NA       12
    16: 2019-03-01 NA       NA
    17: 2019-03-02 NA       NA
    18: 2019-03-03 NA       NA
    19: 2019-03-04 19       19
    20: 2019-03-05 20       20
    

    We build on the fact that subsetting vectors with NA returns NA.

    Data/Packages

    library(zoo)
    library(data.table)
    x <- zoo(1:20, Sys.Date() + 1:20)
    x[c(2:4, 6:10, 13:18)] <- NA
    dt <- data.table(date = index(x), x = as.integer(x))
    
    0 讨论(0)
  • 2020-12-01 17:51

    The cleanest way to implement this in data.table is probably using the join syntax:

    na.omit(dt)[dt, on = .(date), roll = +3, .(date, x_filled = x, x = i.x)]
    
              date x_filled  x
     1: 2019-02-14        1  1
     2: 2019-02-15        1 NA
     3: 2019-02-16        1 NA
     4: 2019-02-17        1 NA
     5: 2019-02-18        5  5
     6: 2019-02-19        5 NA
     7: 2019-02-20        5 NA
     8: 2019-02-21        5 NA
     9: 2019-02-22       NA NA
    10: 2019-02-23       NA NA
    11: 2019-02-24       11 11
    12: 2019-02-25       12 12
    13: 2019-02-26       12 NA
    14: 2019-02-27       12 NA
    15: 2019-02-28       12 NA
    16: 2019-03-01       NA NA
    17: 2019-03-02       NA NA
    18: 2019-03-03       NA NA
    19: 2019-03-04       19 19
    20: 2019-03-05       20 20
    

    *This solution depends on the date columns and it being contiguous

    0 讨论(0)
提交回复
热议问题