Remove rows which have all NAs in certain columns

后端未结

关注

 5  1693

面向向阳花 2020-12-11 08:08

Suppose you have a dataframe with 9 columns. You want to remove cases which have all NAs in columns 5:9. It\'s not at all relevant if there are NAs in columns 1:4.

5条回答

南方客 (楼主)

2020-12-11 08:31

I don't know that it's any faster than your function, but maybe you could use !any and is.na for each row of your data frame. With this example data:

set.seed(1234)
x = do.call(cbind, lapply(1:9, function(x) runif(10)))
x[sample(length(x), size = 70)] <- NA
x <- data.frame(x)

> x
     X1 X2   X3   X4   X5   X6   X7   X8  X9
1  0.11 NA   NA 0.46 0.55 0.07   NA   NA  NA
2  0.62 NA   NA   NA   NA   NA 0.04   NA  NA
3    NA NA   NA 0.30   NA   NA   NA 0.01  NA
4  0.62 NA 0.04 0.51   NA   NA   NA   NA  NA
5  0.86 NA   NA 0.18   NA   NA   NA   NA 0.2
6  0.64 NA   NA   NA   NA 0.50   NA 0.52  NA
7    NA NA   NA   NA 0.68   NA   NA   NA  NA
8    NA NA   NA   NA   NA   NA   NA   NA  NA
9    NA NA   NA   NA   NA 0.17   NA   NA  NA
10   NA NA 0.05   NA   NA   NA   NA   NA  NA

Looks like the 4th, 8th, and 10th rows should be dropped. So, you can use apply to iterate over each row to see if the condition is satisfied- any row where with any values other than NA in the 5th to 9th column will return TRUE, so you can use that as an indexer for your data frame.

keep.rows <- apply(x[, 5:9], 1, FUN = function(row){
  any(!is.na(row))
})

> x[keep.rows, ]
    X1 X2 X3   X4   X5   X6   X7   X8  X9
1 0.11 NA NA 0.46 0.55 0.07   NA   NA  NA
2 0.62 NA NA   NA   NA   NA 0.04   NA  NA
3   NA NA NA 0.30   NA   NA   NA 0.01  NA
5 0.86 NA NA 0.18   NA   NA   NA   NA 0.2
6 0.64 NA NA   NA   NA 0.50   NA 0.52  NA
7   NA NA NA   NA 0.68   NA   NA   NA  NA
9   NA NA NA   NA   NA 0.17   NA   NA  NA

Again, not sure that it's faster than your function but... maybe?

0 讨论(0)

查看其它5个回答