R data.table loop subset by factor and do lm()

前端未结

关注

 3  527

猫巷女王i 2021-01-13 12:03

I am trying to create a function or even just work out how to run a loop using data.table syntax where I can subset the table by factor, in this case the id variable, then r

3条回答

时光取名叫无心 (楼主)

2021-01-13 12:37

If the coefficients is what you need, here is a data.table way.

df <- data.frame(id = letters[1:3], 
                 cyl = sample(c("a","b","c"), 30, replace = TRUE),
                 fac = sample(c(TRUE, FALSE), 30, replace = TRUE),   
                 hp = sample(c(20:50), 30, replace = TRUE))

dt=as.data.table(df)

# Without using a "by" variable you get an standard lm model
fit = dt[, lm(hp ~ cyl + fac)]

# Using id as a "by" variable you get a model per id
coef_tbl = dt[, as.list(coef(lm(hp ~ cyl + fac))), by=id]

   id (Intercept)      cylb      cylc    facTRUE
1:  a    30.59155  5.901408  2.732394   9.014085
2:  b    45.00000  2.500000 -7.000000  -7.000000
3:  c    35.00000 10.470588  4.176471 -20.705882

EDIT

Added Anova results based on comments:

anova_tbl = dt[, as.list(anova(lm(hp ~ cyl + fac))), by=id]
row_names = dt[, row.names(anova(lm(hp ~ cyl + fac))), by=id]
anova_tbl[, variable := row_names$V1]

> anova_tbl
id Df     Sum Sq    Mean Sq    F value     Pr(>F)  variable
1:  a  2  48.066667  24.033333 0.20758157 0.81814567       cyl
2:  a  1   5.666667   5.666667 0.04894434 0.83224735       fac
3:  a  6 694.666667 115.777778         NA         NA Residuals
4:  b  2  40.600000  20.300000 0.38310492 0.69729630       cyl
5:  b  1  51.571429  51.571429 0.97326443 0.36196440       fac
6:  b  6 317.928571  52.988095         NA         NA Residuals
7:  c  2 277.066667 138.533333 5.39740260 0.04559622       cyl
8:  c  1  89.833333  89.833333 3.50000000 0.11055174       fac
9:  c  6 154.000000  25.666667         NA         NA Residuals

0 讨论(0)

查看其它3个回答