Supposing a data set with several rows and columns with some columns being 0 (I mean all values in the column are 0\'s). How one can filter out those columns? I have tried w
training_data[,apply(training_data, MARGIN = 2, FUN = function(x) !all(x == 0))]
Just another way using lapply
as it is a data.frame
. apply
internally converts data.frame
to a matrix
I believe.
df[!unlist(lapply(df, function(x) all(x==0)))]
Or in your case:
df[, 1:99][!unlist(lapply(df[, 1:99], function(x) all(x==0)))]
Edit: Another way using colSums
. The trick is to use it after checking for 0
.
df[!colSums(df == 0) == nrow(df)]
If you know which columns are numeric (say, 1:99), then replace df
with:
df[,1:99][!colSums(df[,1:99] == 0) == nrow(df)]
apply(df, 2, Filter, f = function(x){!all(x==0)})
I had the same question.
training_data[, !colSums(training_data == 0)]
Based on question update: (filter applied to columns 1 - 99)
idx <- which(as.logical(colSums(training_data[, 1:99] == 0))) # find columns
training_data[, setdiff(seq_along(test_data), idx)] # exclude columns
I think in the solutions using all(x == 0)
it is slightly more efficient to use any(x!=0)
, because any
stops after the first instance of an element being !=0
, which will be important with growing number of rows.
To provide a different solution using plyr
and colwise
(dat
being the dput
data):
library(plyr)
f0 <- function(x) any(x!=0) & is.numeric(x)
colwise(identity, f0)(dat)
The idea is to go through every column in dat and return it (identity
), but only if f0
returns TRUE
, i.e. the column has at least one entry !=0
and the column is.numeric
EDIT:
To do this for every data.frame in your list, eg. training_data <- list(dat, dat, dat, dat)
training_data_clean <- lapply(training_data, function(z) colwise(identity, f0)(z))
sapply(training_data, dim)
[,1] [,2] [,3] [,4]
[1,] 6 6 6 6
[2,] 111 111 111 111
sapply(training_data_clean, dim)
[,1] [,2] [,3] [,4]
[1,] 6 6 6 6
[2,] 74 74 74 74
EDIT2: To retain the label column:
lapply(training_data, function(z) cbind(label = z$label, colwise(identity, f0)(z)))
You can use colSums
dat <- diag(10)
dat[1,1] <- 0
dat[5,5] <- 0
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0
[4,] 0 0 0 1 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 1 0 0 0 0
[7,] 0 0 0 0 0 0 1 0 0 0
[8,] 0 0 0 0 0 0 0 1 0 0
[9,] 0 0 0 0 0 0 0 0 1 0
[10,] 0 0 0 0 0 0 0 0 0 1
colSums(dat) == 0
TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
So to remove the columns with 0 , you just do this
dat[ ,colSums(dat)!=0]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 0 0 0 0 0 0 0
[2,] 1 0 0 0 0 0 0 0
[3,] 0 1 0 0 0 0 0 0
[4,] 0 0 1 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0
[6,] 0 0 0 1 0 0 0 0
[7,] 0 0 0 0 1 0 0 0
[8,] 0 0 0 0 0 1 0 0
[9,] 0 0 0 0 0 0 1 0
[10,] 0 0 0 0 0 0 0 1
EDIT
This assume that all data have the same sign, to avoid this ,
dat[ ,colSums(abs(dat[,1:99]))!=0]