Why the built-in lm function is so slow in R?

后端 未结 2 1572
栀梦
栀梦 2020-12-01 16:38

I always thought that the lm function was extremely fast in R, but as this example would suggest, the closed solution computed using the solve func

2条回答
  •  栀梦
    栀梦 (楼主)
    2020-12-01 17:27

    You are overlooking that

    • solve() only returns your parameters
    • lm() returns you a (very rich) object with many components for subsequent analysis, inference, plots, ...
    • the main cost of your lm() call is not the projection but the resolution of the formula y ~ . from which the model matrix needs to be built.

    To illustrate Rcpp we wrote a few variants of a function fastLm() doing more of what lm() does (ie a bit more than lm.fit() from base R) and measured it. See e.g. this benchmark script which clearly shows that the dominant cost for smaller data sets is in parsing the formula and building the model matrix.

    In short, you are doing the Right Thing by using benchmarking but you are doing it not all that correctly in trying to compare what is mostly incomparable: a subset with a much larger task.

提交回复
热议问题