问题
I would like to write a conditional statement inside mutate_at() so that approx() does not interpolate between values where there are more than 2 missing rows of data.
Here are the data:
dat <- data.frame(
time = 1:10,
var1 = c(10, 10, 10, 12, 12, 12, 15, 15, 15, 15),
var2 = c( 1, NA, 3, 6, NA, NA, NA, 10, 9, 8),
var3 = c(10, NA, NA, 13, 14, 16, NA, 18, 19, 20)
)
The is the chunk of code I would like to adapt such that it does NOT interpolate where there are more than 2 NAs between values (i.e., rows 5-7 in the var2 column should remain NA and all other NAs should be interpolated values.
library(tidyverse)
dat_int <- dat %>%
mutate_at(vars(c(var2, var3)),
funs(approx(time, ., time, rule = 1, method = "linear")[["y"]]))
回答1:
Step 1: Create a function, consecutiveNA
, that can identify the consecutive NA
in a vector based on a threshold (specified by the argument len
).
consecutiveNA <- function(x, len = 2){
rl <- rle(is.na(x))
logi <- rl$lengths >= len & rl$values
rl$values <- logi
inver <- inverse.rle(rl)
return(inver)
}
Step 2: Apply the approx
function to target columns (as you did).
library(tidyverse)
dat_int <- dat %>%
mutate_at(vars(c(var2, var3)),
funs(approx(time, ., time, rule = 1, method = "linear")[["y"]]))
Step 3: Apply the consecutiveNA
function to all columns in dat
and convert the result to a matrix.
m_NA <- map(dat, consecutiveNA, len = 2) %>%
as.data.frame() %>%
as.matrix()
Step 4: Based on m_NA
to replace those TRUE
with NA
in dat_int
, and then the work is done. You can change len
to 3
or other numbers to see if it works.
dat_int[m_NA] <- NA
dat_int
# time var1 var2 var3
# 1 1 10 1 10
# 2 2 10 2 NA
# 3 3 10 3 NA
# 4 4 12 6 13
# 5 5 12 NA 14
# 6 6 12 NA 16
# 7 7 15 NA 17
# 8 8 15 10 18
# 9 9 15 9 19
# 10 10 15 8 20
来源:https://stackoverflow.com/questions/55599319/how-do-i-prevent-interpolation-between-values-where-there-are-more-than-2-missin