Filtering out multiple columns in R

后端 未结 6 1558
难免孤独
难免孤独 2021-01-03 16:06

Supposing a data set with several rows and columns with some columns being 0 (I mean all values in the column are 0\'s). How one can filter out those columns? I have tried w

相关标签:
6条回答
  • 2021-01-03 16:34
    training_data[,apply(training_data, MARGIN = 2, FUN = function(x) !all(x == 0))]
    
    0 讨论(0)
  • 2021-01-03 16:49

    Just another way using lapply as it is a data.frame. apply internally converts data.frame to a matrix I believe.

    df[!unlist(lapply(df, function(x) all(x==0)))]
    

    Or in your case:

    df[, 1:99][!unlist(lapply(df[, 1:99], function(x) all(x==0)))]
    

    Edit: Another way using colSums. The trick is to use it after checking for 0.

    df[!colSums(df == 0) == nrow(df)]
    

    If you know which columns are numeric (say, 1:99), then replace df with:

    df[,1:99][!colSums(df[,1:99] == 0) == nrow(df)]
    
    0 讨论(0)
  • 2021-01-03 16:51
    apply(df, 2, Filter, f = function(x){!all(x==0)})
    

    I had the same question.

    0 讨论(0)
  • 2021-01-03 16:53
    training_data[, !colSums(training_data == 0)]
    

    Based on question update: (filter applied to columns 1 - 99)

    idx <- which(as.logical(colSums(training_data[, 1:99] == 0))) # find columns
    training_data[, setdiff(seq_along(test_data), idx)]           # exclude columns
    
    0 讨论(0)
  • 2021-01-03 16:59

    I think in the solutions using all(x == 0) it is slightly more efficient to use any(x!=0), because any stops after the first instance of an element being !=0, which will be important with growing number of rows.

    To provide a different solution using plyr and colwise (dat being the dputdata):

    library(plyr)
    f0 <- function(x) any(x!=0) & is.numeric(x)
    colwise(identity, f0)(dat)
    

    The idea is to go through every column in dat and return it (identity), but only if f0 returns TRUE, i.e. the column has at least one entry !=0 and the column is.numeric

    EDIT: To do this for every data.frame in your list, eg. training_data <- list(dat, dat, dat, dat)

    training_data_clean <- lapply(training_data, function(z) colwise(identity, f0)(z))
    
    sapply(training_data, dim)
         [,1] [,2] [,3] [,4]
    [1,]    6    6    6    6
    [2,]  111  111  111  111
    
    sapply(training_data_clean, dim)
         [,1] [,2] [,3] [,4]
    [1,]    6    6    6    6
    [2,]   74   74   74   74
    

    EDIT2: To retain the label column:

    lapply(training_data, function(z) cbind(label = z$label, colwise(identity, f0)(z)))
    
    0 讨论(0)
  • 2021-01-03 17:01

    You can use colSums

    dat <- diag(10)
    dat[1,1]  <- 0
    dat[5,5]  <- 0
    
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
     [1,]    0    0    0    0    0    0    0    0    0     0
     [2,]    0    1    0    0    0    0    0    0    0     0
     [3,]    0    0    1    0    0    0    0    0    0     0
     [4,]    0    0    0    1    0    0    0    0    0     0
     [5,]    0    0    0    0    0    0    0    0    0     0
     [6,]    0    0    0    0    0    1    0    0    0     0
     [7,]    0    0    0    0    0    0    1    0    0     0
     [8,]    0    0    0    0    0    0    0    1    0     0
     [9,]    0    0    0    0    0    0    0    0    1     0
    [10,]    0    0    0    0    0    0    0    0    0     1
    
    colSums(dat) == 0
     TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
    

    So to remove the columns with 0 , you just do this

    dat[  ,colSums(dat)!=0]
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
     [1,]    0    0    0    0    0    0    0    0
     [2,]    1    0    0    0    0    0    0    0
     [3,]    0    1    0    0    0    0    0    0
     [4,]    0    0    1    0    0    0    0    0
     [5,]    0    0    0    0    0    0    0    0
     [6,]    0    0    0    1    0    0    0    0
     [7,]    0    0    0    0    1    0    0    0
     [8,]    0    0    0    0    0    1    0    0
     [9,]    0    0    0    0    0    0    1    0
    [10,]    0    0    0    0    0    0    0    1
    

    EDIT

    This assume that all data have the same sign, to avoid this ,

    dat[  ,colSums(abs(dat[,1:99]))!=0]
    
    0 讨论(0)
提交回复
热议问题