Using `:=` in data.table to sum the values of two columns in R, ignoring NAs

前端 未结 2 497
耶瑟儿~
耶瑟儿~ 2020-12-05 10:41

I have what I think is a very simple question related to the use of data.table and the := function. I don\'t think I quite understand the behaviour of :=<

相关标签:
2条回答
  • 2020-12-05 11:28

    This is standard R behaviour, nothing really to do with data.table

    Adding anything to NA will return NA

    NA + 1
    ## NA
    

    sum will return a single number

    If you want 1 + NA to return 1

    then you will have to run something like

    mat[,col3 := col1 + col2]
    mat[is.na(col1), col3 := col2]
    mat[is.na(col2), col3 := col1]
    

    To deal with when col1 or col2 are NA


    EDIT - an easier solution

    You could also use rowSums, which has a na.rm argument

    mat[ , col3 :=rowSums(.SD, na.rm = TRUE), .SDcols = c("col1", "col2")]
    

    rowSums is what you want (by definition, the rowSums of a matrix containing col1 and col2, removing NA values

    (@JoshuaUlrich suggested this as a comment )

    0 讨论(0)
  • 2020-12-05 11:32

    It's not a lack of understanding of data.table but rather one regarding vectorized functions in R. You can define a dyadic operator that will behave differently than the "+" operator with regard to missing values:

     `%+na%` <- function(x,y) {ifelse( is.na(x), y, ifelse( is.na(y), x, x+y) )}
    
     mat[ , col3:= col1 %+na% col2]
    #-------------------------------
            col1      col2      col3
    1:        NA  0.003745  0.003745
    2:  0.000000  0.007463  0.007463
    3: -0.015038 -0.007407 -0.022445
    4:  0.003817 -0.003731  0.000086
    5: -0.011407 -0.007491 -0.018898
    

    You can use mrdwad's comment to do it with sum(... , na.rm=TRUE):

    mat[ , col4 := sum(col1, col2, na.rm=TRUE), by=1:NROW(mat)]
    
    0 讨论(0)
提交回复
热议问题