Remove the columns with the colsums=0

后端 未结 1 1990
温柔的废话
温柔的废话 2020-12-22 00:35

I have a matrix which its elements are 0, 1,2,NA!
I want to remove the columns which their colsums are equal to 0 or NA! I want to drop these columns from the original m

相关标签:
1条回答
  • 2020-12-22 01:21

    Work out which ones have colSums != 0:

    i <- (colSums(mat, na.rm=T) != 0) # T if colSum is not 0, F otherwise
    

    Then you can either select or drop them e.g.

    matnonzero <- mat[, i] # all the non-zero columns
    matzeros <- mat[, !i]  # all the zero columns
    

    update to comment (are there ways to do it without the colSums). IMO, yes, there are, but colSums is one of the more elegant/efficient ways.

    You could do something like:

    apply(is.na(mat) | mat == 0, 2, all)
    

    which will return TRUE for each column that is all-NA/0, so that

    mat[, !apply(is.na(mat) | mat == 0, 2, all)]
    

    will return all the non-zero columns.

    However colSums is faster than apply.

    system.time( replicate(1000, mat[, !apply(is.na(mat) | mat == 0, 2, all)]) )
    #   user  system elapsed 
    #  0.068   0.000   0.069 
    system.time( replicate(1000, mat[, colSums(mat, na.rm=T) != 0]))
    #   user  system elapsed 
    #  0.012   0.000   0.013 
    

    I'm sure there are many other ways to do it too.


    update again as OP keeps adding to their question in the comments.. The new question is: remove all columns that:

    • have a 0 or a NA
    • the entire column has all of the same value in it.

    The mechanics are unchanged - you just come up with a boolean (true or false) for each column deciding whether to keep it or not.

    e.g.

    Just like if all values in a column are is.na or ==0 you drop the column, with your second condition you could write (e.g.) length(unique({column})) == 1, or all(diff({column})) == 0, or many other equivalent ways.

    So to combine them, remember that apply(X, 2, FUN) will apply the function FUN to every column of X.

    So you could do:

    i <- apply(mat,
          2,
          function (column) {
              any(is.na(col) | col == 0) |
              length(unique(col)) == 1
          })
    

    which returns TRUE if the column has any NAs or 0s, or if the entire column has only one unique value. So this is TRUE if we should discard that column. Then you subset your matrix just as before, i.e.

    mat[, !i]
    

    If you wish to add further conditions different to the ones you have already asked for, think them through and give it a try yourself, and if you still can't, ask a new question rather than modifying this one again.

    0 讨论(0)
提交回复
热议问题