Filtering out multiple columns in R

后端未结

关注

 6  1560

难免孤独 2021-01-03 16:06

Supposing a data set with several rows and columns with some columns being 0 (I mean all values in the column are 0\'s). How one can filter out those columns? I have tried w

6条回答

轻奢々 (楼主)

2021-01-03 16:59
I think in the solutions using all(x == 0) it is slightly more efficient to use any(x!=0), because any stops after the first instance of an element being !=0, which will be important with growing number of rows.

To provide a different solution using plyr and colwise (dat being the dputdata):
```
library(plyr)
f0 <- function(x) any(x!=0) & is.numeric(x)
colwise(identity, f0)(dat)
```
The idea is to go through every column in dat and return it (identity), but only if f0 returns TRUE, i.e. the column has at least one entry !=0 and the column is.numeric

EDIT: To do this for every data.frame in your list, eg. training_data <- list(dat, dat, dat, dat)
```
training_data_clean <- lapply(training_data, function(z) colwise(identity, f0)(z))

sapply(training_data, dim)
     [,1] [,2] [,3] [,4]
[1,]    6    6    6    6
[2,]  111  111  111  111

sapply(training_data_clean, dim)
     [,1] [,2] [,3] [,4]
[1,]    6    6    6    6
[2,]   74   74   74   74
```
EDIT2: To retain the label column:
```
lapply(training_data, function(z) cbind(label = z$label, colwise(identity, f0)(z)))
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...