问题
I have a dataframe where some of the values are NA. I would like to remove these columns.
My data.frame looks like this
v1 v2
1 1 NA
2 1 1
3 2 2
4 1 1
5 2 2
6 1 NA
I tried to estimate the col mean and select the column means !=NA. I tried this statement, it does not work.
data=subset(Itun, select=c(is.na(colMeans(Itun))))
I got an error,
error : 'x' must be an array of at least two dimensions
Can anyone give me some help?
回答1:
The data:
Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA))
This will remove all columns containing at least one NA
:
Itun[ , colSums(is.na(Itun)) == 0]
An alternative way is to use apply
:
Itun[ , apply(Itun, 2, function(x) !any(is.na(x)))]
回答2:
Here's a convenient way to do using dplyr
function select_if()
. Combine not (!
), any()
and is.na()
, which is the equivalent of selecting all columns that don't contain any NA values.
library(dplyr)
Itun %>%
select_if(~ !any(is.na(.))
回答3:
You can use transpose twice:
newdf <- t(na.omit(t(df)))
回答4:
data[,!apply(is.na(data), 2, any)]
回答5:
A base R method related to the apply
answers is
Itun[!unlist(vapply(Itun, anyNA, logical(1)))]
v1
1 1
2 1
3 2
4 1
5 2
6 1
Here, vapply
is used as we are operating on a list, and, apply
, it does not coerce the object into a matrix. Also, since we know that the output will be logical vector of length 1, we can feed this to vapply
and potentially get a little speed boost. For the same reason, I used anyNA
instead of any(is.na())
.
回答6:
Another alternative with the dplyr
package would be to make use of the Filter
function
Filter(function(x) !any(is.na(x)), Itun)
with data.table
would be a little more cumbersome
setDT(Itun)[,.SD,.SDcols=setdiff((1:ncol(Itun)),
which(colSums(is.na(Itun))>0))]
来源:https://stackoverflow.com/questions/12454487/remove-columns-from-dataframe-where-some-of-values-are-na