I am trying to count the number of columns that do not contain NA for each row, and place that value into a new column for that row.
Example data:
li
The two options that quickly come to mind are:
d[, num_obs := sum(!is.na(.SD)), by = 1:nrow(d)][]
d[, num_obs := rowSums(!is.na(d))][]
The first works by creating a "group" of just one row per group (1:nrow(d)). Without that, it would just sum the NA values within the entire table.
The second makes use of an already very efficient base R function, rowSums.
Here is a benchmark on larger data:
set.seed(1)
nrow = 10000
ncol = 15
d <- as.data.table(matrix(sample(c(NA, -5:10), nrow*ncol, TRUE), nrow = nrow, ncol = ncol))
fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][]
fun2 <- function(indt) indt[, num_obs := sum(!is.na(.SD)), by = 1:nrow(indt)][]
library(microbenchmark)
microbenchmark(fun1(copy(d)), fun2(copy(d)))
# Unit: milliseconds
# expr min lq mean median uq max neval
# fun1(copy(d)) 3.727958 3.906458 5.507632 4.159704 4.475201 106.5708 100
# fun2(copy(d)) 584.499120 655.634889 684.889614 681.054752 712.428684 861.1650 100
By the way, the empty [] is just to print the resulting data.table. This is required when you want to return the output from set* functions in "data.table".