I am trying to create a function or even just work out how to run a loop using data.table syntax where I can subset the table by factor, in this case the id variable, then r
If the coefficients is what you need, here is a data.table
way.
df <- data.frame(id = letters[1:3],
cyl = sample(c("a","b","c"), 30, replace = TRUE),
fac = sample(c(TRUE, FALSE), 30, replace = TRUE),
hp = sample(c(20:50), 30, replace = TRUE))
dt=as.data.table(df)
# Without using a "by" variable you get an standard lm model
fit = dt[, lm(hp ~ cyl + fac)]
# Using id as a "by" variable you get a model per id
coef_tbl = dt[, as.list(coef(lm(hp ~ cyl + fac))), by=id]
id (Intercept) cylb cylc facTRUE
1: a 30.59155 5.901408 2.732394 9.014085
2: b 45.00000 2.500000 -7.000000 -7.000000
3: c 35.00000 10.470588 4.176471 -20.705882
EDIT
Added Anova results based on comments:
anova_tbl = dt[, as.list(anova(lm(hp ~ cyl + fac))), by=id]
row_names = dt[, row.names(anova(lm(hp ~ cyl + fac))), by=id]
anova_tbl[, variable := row_names$V1]
> anova_tbl
id Df Sum Sq Mean Sq F value Pr(>F) variable
1: a 2 48.066667 24.033333 0.20758157 0.81814567 cyl
2: a 1 5.666667 5.666667 0.04894434 0.83224735 fac
3: a 6 694.666667 115.777778 NA NA Residuals
4: b 2 40.600000 20.300000 0.38310492 0.69729630 cyl
5: b 1 51.571429 51.571429 0.97326443 0.36196440 fac
6: b 6 317.928571 52.988095 NA NA Residuals
7: c 2 277.066667 138.533333 5.39740260 0.04559622 cyl
8: c 1 89.833333 89.833333 3.50000000 0.11055174 fac
9: c 6 154.000000 25.666667 NA NA Residuals