Fast linear regression by group

后端 未结 5 1954
后悔当初
后悔当初 2020-12-08 08:31

I have 500K users and I need to compute a linear regression (with intercept) for each of them.

Each user has around 30 records.

I t

5条回答
  •  情书的邮戳
    2020-12-08 09:15

    You might give this a try using data.table like this. I've just created some toy data but I'd imagine data.table would give some improvement. It's quite speedy. But that is quite a large data-set so perhaps benchmark this approach on a smaller sample to see if the speed is a lot better. good luck.

    
        library(data.table)
    
        exp <- data.table(id = rep(c("a","b","c"), each = 20), x = rnorm(60,5,1.5), y = rnorm(60,2,.2))
        # edit: it might also help to set a key on id with such a large data-set
        # with the toy example it would make no diff of course
        exp <- setkey(exp,id)
        # the nuts and bolts of the data.table part of the answer
        result <- exp[, as.list(coef(lm(y ~ x))), by=id]
        result
           id (Intercept)            x
        1:  a    2.013548 -0.008175644
        2:  b    2.084167 -0.010023549
        3:  c    1.907410  0.015823088
    

提交回复
热议问题