using lm(my_formula) inside [.data.table's j

。_饼干妹妹 提交于 2019-12-07 10:36:32

问题


I have gotten in the habit of accessing data.table columns in j even when I do not need to:

require(data.table)
set.seed(1); n = 10
DT <- data.table(x=rnorm(n),y=rnorm(n))

frm <- formula(x~y)

DT[,lm(x~y)]         # 1 works
DT[,lm(frm)]         # 2 fails
lm(frm,data=DT)      # 3 what I'll do instead

I expected # 2 to work, since lm should search for variables in DT and then in the global environment... Is there an elegant way to get something like # 2 to work?

In this case, I'm using lm, which takes a "data" argument, so # 3 works just fine.

EDIT. Note that this works:

x1 <- DT$x
y1 <- DT$y
frm1 <- formula(x1~y1)
lm(frm1)

and this, too:

rm(x1,y1)
bah <- function(){
    x1 <- DT$x
    y1 <- DT$y
    frm1 <- formula(x1~y1)
    lm(frm1)
}
bah()

EDIT2. However, this fails, illustrating @eddi's answer

frm1 <- formula(x1~y1)
bah1 <- function(){
    x1 <- DT$x
    y1 <- DT$y
    lm(frm1)
}
bah1()

回答1:


The way lm works it looks for the variables used in the environment of the formula supplied. Since you create your formula in the global environment, it's not going to look in the j-expression environment, so the only way to make the exact expression lm(frm) work would be to add the appropriate variables to the correct environment:

DT[, {assign('x', x, environment(frm));
      assign('y', y, environment(frm));
      lm(frm)}]

Now obviously this is not a very good solution, and both Arun's and Josh's suggestions are much better and I'm just putting it here for the understanding of the problem at hand.

edit Another (possibly more perverted, and quite fragile) way would be to change the environment of the formula at hand (I do it permanently here, but you could revert it back, or copy it and then do it):

DT[, {setattr(frm, '.Environment', get('SDenv', parent.frame(2))); lm(frm)}]

Btw a funny thing is happening here - whenever you use get in j-expression, all of the variables get constructed (so don't use it if you can avoid it), and this is why I don't need to also use x and y in some way for data.table to know that those variables are needed.



来源:https://stackoverflow.com/questions/19311600/using-lmmy-formula-inside-data-tables-j

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!