I have 500K users and I need to compute a linear regression (with intercept) for each of them.
Each user has around 30 records.
I t
You might give this a try using data.table like this. I've just created some toy data but I'd imagine data.table would give some improvement. It's quite speedy. But that is quite a large data-set so perhaps benchmark this approach on a smaller sample to see if the speed is a lot better. good luck.
library(data.table)
exp <- data.table(id = rep(c("a","b","c"), each = 20), x = rnorm(60,5,1.5), y = rnorm(60,2,.2))
# edit: it might also help to set a key on id with such a large data-set
# with the toy example it would make no diff of course
exp <- setkey(exp,id)
# the nuts and bolts of the data.table part of the answer
result <- exp[, as.list(coef(lm(y ~ x))), by=id]
result
id (Intercept) x
1: a 2.013548 -0.008175644
2: b 2.084167 -0.010023549
3: c 1.907410 0.015823088