Loop through data.table and create new columns basis some condition

后端 未结 3 1968
无人共我
无人共我 2021-01-13 04:34

I have a data.table with quite a few columns. I need to loop through them and create new columns using some condition. Currently I am writing separate line of condition for

3条回答
  •  情歌与酒
    2021-01-13 05:25

    We can do this using :=. We subset the column names that are not the grouping variables ('nm'). Create a vector of names to assign for the new columns using outer ('nm1'). Then, we use the OP's code, unlist the output and assign (:=) it to 'nm1' to create the new columns.

    nm <- names(DT)[-(1:2)]
    
    nm1 <- c(t(outer(c("Mean", "SD", "uplimit", "lowlimit"), nm, paste, sep="_")))
    
    DT[, (nm1):= unlist(lapply(.SD, function(x) { Mean = mean(x)
                                      SD = sd(x)
                         uplimit = Mean + 1.96*SD
                         lowlimit = Mean - 1.96*SD
                 list(Mean, SD, uplimit, lowlimit) }), recursive=FALSE) ,
                        .(town, tc)]
    

    The second part of the question involves doing a logical comparison between columns. One option would be to subset the initial columns, the 'lowlimit' and 'uplimit' columns separately and do the comparison (as these have the same dimensions) to get a logical output which can be coerced to binary with +. Then assign it to the original dataset to create the outlier columns.

    m1 <- +(DT[, nm, with = FALSE] >= DT[, paste("lowlimit", nm, sep="_"), 
              with = FALSE] & DT[, nm, with = FALSE] <= DT[, 
                paste("uplimit", nm, sep="_"), with = FALSE])
    DT[,paste(nm, "Aoutlier", sep=".") := as.data.frame(m1)]
    

    Or instead of comparing data.tables, we can also use a for loop with set (which would be more efficient)

    nm2 <- paste(nm, "Aoutlier", sep=".")
    DT[, (nm2) := NA_integer_]
    for(j in nm){
     set(DT, i = NULL, j = paste(j, "Aoutlier", sep="."), 
       value = as.integer(DT[[j]] >= DT[[paste("lowlimit", j, sep="_")]] & 
               DT[[j]] <= DT[[paste("uplimit", j, sep="_")]]))
     }
    

    The 'log' columns can also be created with :=

    DT[,paste(nm, "log", sep=".") := lapply(.SD,log),by = .(town,tc),.SDcols=nm]
    

提交回复
热议问题