R data.table loop subset by factor and do lm()

前端 未结 3 527
猫巷女王i
猫巷女王i 2021-01-13 12:03

I am trying to create a function or even just work out how to run a loop using data.table syntax where I can subset the table by factor, in this case the id variable, then r

3条回答
  •  时光取名叫无心
    2021-01-13 12:37

    If the coefficients is what you need, here is a data.table way.

    df <- data.frame(id = letters[1:3], 
                     cyl = sample(c("a","b","c"), 30, replace = TRUE),
                     fac = sample(c(TRUE, FALSE), 30, replace = TRUE),   
                     hp = sample(c(20:50), 30, replace = TRUE))
    
    dt=as.data.table(df)
    
    # Without using a "by" variable you get an standard lm model
    fit = dt[, lm(hp ~ cyl + fac)]
    
    # Using id as a "by" variable you get a model per id
    coef_tbl = dt[, as.list(coef(lm(hp ~ cyl + fac))), by=id]
    
       id (Intercept)      cylb      cylc    facTRUE
    1:  a    30.59155  5.901408  2.732394   9.014085
    2:  b    45.00000  2.500000 -7.000000  -7.000000
    3:  c    35.00000 10.470588  4.176471 -20.705882
    

    EDIT

    Added Anova results based on comments:

    anova_tbl = dt[, as.list(anova(lm(hp ~ cyl + fac))), by=id]
    row_names = dt[, row.names(anova(lm(hp ~ cyl + fac))), by=id]
    anova_tbl[, variable := row_names$V1]
    
    > anova_tbl
    id Df     Sum Sq    Mean Sq    F value     Pr(>F)  variable
    1:  a  2  48.066667  24.033333 0.20758157 0.81814567       cyl
    2:  a  1   5.666667   5.666667 0.04894434 0.83224735       fac
    3:  a  6 694.666667 115.777778         NA         NA Residuals
    4:  b  2  40.600000  20.300000 0.38310492 0.69729630       cyl
    5:  b  1  51.571429  51.571429 0.97326443 0.36196440       fac
    6:  b  6 317.928571  52.988095         NA         NA Residuals
    7:  c  2 277.066667 138.533333 5.39740260 0.04559622       cyl
    8:  c  1  89.833333  89.833333 3.50000000 0.11055174       fac
    9:  c  6 154.000000  25.666667         NA         NA Residuals
    

提交回复
热议问题