How do I prevent interpolation between values where there are more than 2 missing rows of data?

怎甘沉沦 提交于 2019-12-08 15:20:38

Step 1: Create a function, consecutiveNA, that can identify the consecutive NA in a vector based on a threshold (specified by the argument len).

consecutiveNA <- function(x, len = 2){
  rl <- rle(is.na(x))
  logi <- rl$lengths >= len & rl$values
  rl$values <- logi
  inver <- inverse.rle(rl)
  return(inver)
}

Step 2: Apply the approx function to target columns (as you did).

library(tidyverse)

dat_int <- dat %>%
  mutate_at(vars(c(var2, var3)),
            funs(approx(time, ., time, rule = 1, method = "linear")[["y"]]))

Step 3: Apply the consecutiveNA function to all columns in dat and convert the result to a matrix.

m_NA <- map(dat, consecutiveNA, len = 2) %>%
  as.data.frame() %>%
  as.matrix()

Step 4: Based on m_NA to replace those TRUE with NA in dat_int, and then the work is done. You can change len to 3 or other numbers to see if it works.

dat_int[m_NA] <- NA

dat_int
#    time var1 var2 var3
# 1     1   10    1   10
# 2     2   10    2   NA
# 3     3   10    3   NA
# 4     4   12    6   13
# 5     5   12   NA   14
# 6     6   12   NA   16
# 7     7   15   NA   17
# 8     8   15   10   18
# 9     9   15    9   19
# 10   10   15    8   20
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!