Filtering out multiple columns in R

后端 未结 6 1560
难免孤独
难免孤独 2021-01-03 16:06

Supposing a data set with several rows and columns with some columns being 0 (I mean all values in the column are 0\'s). How one can filter out those columns? I have tried w

6条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-03 16:59

    I think in the solutions using all(x == 0) it is slightly more efficient to use any(x!=0), because any stops after the first instance of an element being !=0, which will be important with growing number of rows.

    To provide a different solution using plyr and colwise (dat being the dputdata):

    library(plyr)
    f0 <- function(x) any(x!=0) & is.numeric(x)
    colwise(identity, f0)(dat)
    

    The idea is to go through every column in dat and return it (identity), but only if f0 returns TRUE, i.e. the column has at least one entry !=0 and the column is.numeric

    EDIT: To do this for every data.frame in your list, eg. training_data <- list(dat, dat, dat, dat)

    training_data_clean <- lapply(training_data, function(z) colwise(identity, f0)(z))
    
    sapply(training_data, dim)
         [,1] [,2] [,3] [,4]
    [1,]    6    6    6    6
    [2,]  111  111  111  111
    
    sapply(training_data_clean, dim)
         [,1] [,2] [,3] [,4]
    [1,]    6    6    6    6
    [2,]   74   74   74   74
    

    EDIT2: To retain the label column:

    lapply(training_data, function(z) cbind(label = z$label, colwise(identity, f0)(z)))
    

提交回复
热议问题