Filtering out multiple columns in R

后端未结

关注

 6  1558

Supposing a data set with several rows and columns with some columns being 0 (I mean all values in the column are 0\'s). How one can filter out those columns? I have tried w

相关标签:

6条回答

[愿得一人]

2021-01-03 16:34

training_data[,apply(training_data, MARGIN = 2, FUN = function(x) !all(x == 0))]

0 讨论(0)

醉酒成梦

2021-01-03 16:49
Just another way using lapply as it is a data.frame. apply internally converts data.frame to a matrix I believe.
```
df[!unlist(lapply(df, function(x) all(x==0)))]
```
Or in your case:
```
df[, 1:99][!unlist(lapply(df[, 1:99], function(x) all(x==0)))]
```
Edit: Another way using colSums. The trick is to use it after checking for 0.
```
df[!colSums(df == 0) == nrow(df)]
```
If you know which columns are numeric (say, 1:99), then replace df with:
```
df[,1:99][!colSums(df[,1:99] == 0) == nrow(df)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
夕颜

2021-01-03 16:51
```
apply(df, 2, Filter, f = function(x){!all(x==0)})
```
I had the same question.
0 讨论(0)
发布评论:

提交评论
- 加载中...

陌清茗

2021-01-03 16:53

training_data[, !colSums(training_data == 0)]

Based on question update: (filter applied to columns 1 - 99)

idx <- which(as.logical(colSums(training_data[, 1:99] == 0))) # find columns
training_data[, setdiff(seq_along(test_data), idx)]           # exclude columns

0 讨论(0)

轻奢々

2021-01-03 16:59
I think in the solutions using all(x == 0) it is slightly more efficient to use any(x!=0), because any stops after the first instance of an element being !=0, which will be important with growing number of rows.

To provide a different solution using plyr and colwise (dat being the dputdata):
```
library(plyr)
f0 <- function(x) any(x!=0) & is.numeric(x)
colwise(identity, f0)(dat)
```
The idea is to go through every column in dat and return it (identity), but only if f0 returns TRUE, i.e. the column has at least one entry !=0 and the column is.numeric

EDIT: To do this for every data.frame in your list, eg. training_data <- list(dat, dat, dat, dat)
```
training_data_clean <- lapply(training_data, function(z) colwise(identity, f0)(z))

sapply(training_data, dim)
     [,1] [,2] [,3] [,4]
[1,]    6    6    6    6
[2,]  111  111  111  111

sapply(training_data_clean, dim)
     [,1] [,2] [,3] [,4]
[1,]    6    6    6    6
[2,]   74   74   74   74
```
EDIT2: To retain the label column:
```
lapply(training_data, function(z) cbind(label = z$label, colwise(identity, f0)(z)))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

长发绾君心

2021-01-03 17:01

You can use colSums

dat <- diag(10)
dat[1,1]  <- 0
dat[5,5]  <- 0

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    0    0    0    0    0    0    0    0    0     0
 [2,]    0    1    0    0    0    0    0    0    0     0
 [3,]    0    0    1    0    0    0    0    0    0     0
 [4,]    0    0    0    1    0    0    0    0    0     0
 [5,]    0    0    0    0    0    0    0    0    0     0
 [6,]    0    0    0    0    0    1    0    0    0     0
 [7,]    0    0    0    0    0    0    1    0    0     0
 [8,]    0    0    0    0    0    0    0    1    0     0
 [9,]    0    0    0    0    0    0    0    0    1     0
[10,]    0    0    0    0    0    0    0    0    0     1

colSums(dat) == 0
 TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE

So to remove the columns with 0 , you just do this

dat[  ,colSums(dat)!=0]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]    0    0    0    0    0    0    0    0
 [2,]    1    0    0    0    0    0    0    0
 [3,]    0    1    0    0    0    0    0    0
 [4,]    0    0    1    0    0    0    0    0
 [5,]    0    0    0    0    0    0    0    0
 [6,]    0    0    0    1    0    0    0    0
 [7,]    0    0    0    0    1    0    0    0
 [8,]    0    0    0    0    0    1    0    0
 [9,]    0    0    0    0    0    0    1    0
[10,]    0    0    0    0    0    0    0    1

EDIT

This assume that all data have the same sign, to avoid this ,

dat[  ,colSums(abs(dat[,1:99]))!=0]

0 讨论(0)