I have what I think is a very simple question related to the use of data.table and the := function. I don\'t think I quite understand the behaviour of :=<
This is standard R behaviour, nothing really to do with data.table
Adding anything to NA will return NA
NA + 1
## NA
sum will return a single number
If you want 1 + NA to return 1
then you will have to run something like
mat[,col3 := col1 + col2]
mat[is.na(col1), col3 := col2]
mat[is.na(col2), col3 := col1]
To deal with when col1 or col2 are NA
You could also use rowSums, which has a na.rm argument
mat[ , col3 :=rowSums(.SD, na.rm = TRUE), .SDcols = c("col1", "col2")]
rowSums is what you want (by definition, the rowSums of a matrix containing col1 and col2, removing NA values
(@JoshuaUlrich suggested this as a comment )
It's not a lack of understanding of data.table but rather one regarding vectorized functions in R. You can define a dyadic operator that will behave differently than the "+" operator with regard to missing values:
`%+na%` <- function(x,y) {ifelse( is.na(x), y, ifelse( is.na(y), x, x+y) )}
mat[ , col3:= col1 %+na% col2]
#-------------------------------
col1 col2 col3
1: NA 0.003745 0.003745
2: 0.000000 0.007463 0.007463
3: -0.015038 -0.007407 -0.022445
4: 0.003817 -0.003731 0.000086
5: -0.011407 -0.007491 -0.018898
You can use mrdwad's comment to do it with sum(... , na.rm=TRUE):
mat[ , col4 := sum(col1, col2, na.rm=TRUE), by=1:NROW(mat)]