Calculate summary statistics (e.g. mean) on all numeric columns using data.table

前端 未结 1 747
不知归路
不知归路 2020-12-09 23:40

I have data with both numeric and non-numeric columns like this:

mydt
          vnum1 vint1 vfac1 vch1
 1: -0.30159484     8     3          


        
相关标签:
1条回答
  • 2020-12-10 00:25

    By searching on SO for .SDcols, I landed up on this answer, which I think explains quite nicely how to use it.

    cols = sapply(mydt, is.numeric)
    cols = names(cols)[cols]
    mydt[, lapply(.SD, mean), .SDcols = cols]
    #        vnum1 vint1
    # 1: -0.046491   4.5
    

    Doing mydt[, sapply(mydt, is.numeric), with = FALSE] (note: the "modern" way to do that is mydt[ , .SD, .SDcols = is.numeric])is not that efficient because it subsets your data.table with those columns and that makes a (deep) copy - more memory used unnecessarily.

    And using colMeans coerces the data.table into a matrix, which again is not so memory efficient.

    0 讨论(0)
提交回复
热议问题