I have a data frame where all the variables are of character type. Many of the columns are completely empty, i.e. only the variable headers are there, but no values. Is ther
A simple solution using the purrr
package:
purrr::discard(my_data_frame, ~all(is.na(.)))
If you know the column indices, you can use
df[,-c(3, 5, 7)]
This will omit columns 3, 5, 7.
I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.
The selected answer: https://stackoverflow.com/a/17672737/233467 didn't work for me, but this did:
df[!sapply(df, function (x) all(is.na(x) | x == ""))]
It depends what you mean by empty: Is it NA or ""
, or can it even be " "
? Something like this might work:
df[,!apply(df, 2, function(x) all(gsub(" ", "", x)=="", na.rm=TRUE))]
You can do either of the following:
emptycols <- sapply(df, function (k) all(is.na(k)))
df <- df[!emptycols]
or:
emptycols <- colSums(is.na(df)) == nrow(df)
df <- df[!emptycols]
If by empty you mean they are ""
, the second approach can be adapted like so:
emptycols <- colSums(df == "") == nrow(df)
Here is something that can be modified to exclude columns containing any variables specied.
newdf= df[, apply(df, 2, function(x) !any({is.na(x) | x== "" |
x== "-4"} ) )]