How to speed up GLM estimation?

前端 未结 3 762
花落未央
花落未央 2020-12-10 04:38

I am using RStudio 0.97.320 (R 2.15.3) on Amazon EC2. My data frame has 200k rows and 12 columns.

I am trying to fit a logistic regression with approximately 1500 p

3条回答
  •  被撕碎了的回忆
    2020-12-10 05:18

    Assuming that your design matrix is not sparse, then you can also consider my package parglm. See this vignette for a comparison of computation times and further details. I show a comparison here of computation times on a related question.

    One of the methods in the parglm function works as the bam function in mgcv. The method is described in detail in

    Wood, S.N., Goude, Y. & Shaw S. (2015) Generalized additive models for large datasets. Journal of the Royal Statistical Society, Series C 64(1): 139-155.

    On advantage of the method is that one can implement it with non-concurrent QR implementation and still do the computation in parallel. Another advantage is a potentially lower memory footprint. This is used in mgcv's bam function and could also be implemented here with a setup as in speedglm's shglm function.

提交回复
热议问题