Suppose you have a dataframe with 9 columns. You want to remove cases which have all NAs in columns 5:9. It\'s not at all relevant if there are NAs in columns 1:4.
S
I don't know that it's any faster than your function, but maybe you could use !any
and is.na
for each row of your data frame. With this example data:
set.seed(1234)
x = do.call(cbind, lapply(1:9, function(x) runif(10)))
x[sample(length(x), size = 70)] <- NA
x <- data.frame(x)
> x
X1 X2 X3 X4 X5 X6 X7 X8 X9
1 0.11 NA NA 0.46 0.55 0.07 NA NA NA
2 0.62 NA NA NA NA NA 0.04 NA NA
3 NA NA NA 0.30 NA NA NA 0.01 NA
4 0.62 NA 0.04 0.51 NA NA NA NA NA
5 0.86 NA NA 0.18 NA NA NA NA 0.2
6 0.64 NA NA NA NA 0.50 NA 0.52 NA
7 NA NA NA NA 0.68 NA NA NA NA
8 NA NA NA NA NA NA NA NA NA
9 NA NA NA NA NA 0.17 NA NA NA
10 NA NA 0.05 NA NA NA NA NA NA
Looks like the 4th, 8th, and 10th rows should be dropped. So, you can use apply
to iterate over each row to see if the condition is satisfied- any row where with any values other than NA
in the 5th to 9th column will return TRUE
, so you can use that as an indexer for your data frame.
keep.rows <- apply(x[, 5:9], 1, FUN = function(row){
any(!is.na(row))
})
> x[keep.rows, ]
X1 X2 X3 X4 X5 X6 X7 X8 X9
1 0.11 NA NA 0.46 0.55 0.07 NA NA NA
2 0.62 NA NA NA NA NA 0.04 NA NA
3 NA NA NA 0.30 NA NA NA 0.01 NA
5 0.86 NA NA 0.18 NA NA NA NA 0.2
6 0.64 NA NA NA NA 0.50 NA 0.52 NA
7 NA NA NA NA 0.68 NA NA NA NA
9 NA NA NA NA NA 0.17 NA NA NA
Again, not sure that it's faster than your function but... maybe?