Why do I need to wrap `get` in a dummy function within a J `lapply` call?

三世轮回 提交于 2020-01-14 14:30:30

问题


I'm looking to process columns by criteria like class or common pattern matching via grep.

My first attempt did not work:

require(data.table)
test.table <- data.table(a=1:10,ab=1:10,b=101:110)
##this does not work and hangs on my machine
test.table[,lapply(names(test.table)[grep("a",names(test.table))], get)]

Ricardo Saporta notes in an answer that you can use this construct, but you have to wrap get in a dummy function:

##this works
test.table[,lapply(names(test.table)[grep("a",names(test.table))], function(x) get(x))]

Why do you need the anonymous function?

(The preferred/cleaner method is via .SDcols:)

test.table[,.SD,.SDcols=grep("a",names(test.table))]
test.table[, grep("a", names(test.table), with = FALSE]

回答1:


This is a function of lapply, not really data.table From the lapply documentation:

For historical reasons, the calls created by lapply are unevaluated, and code has been written (e.g. bquote) that relies on this. This means that the recorded call is always of the form FUN(X[[0L]], ...), with 0L replaced by the current integer index. This is not normally a problem, but it can be if FUN uses sys.call or match.call or if it is a primitive function that makes use of the call. This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required in R 2.7.1 to ensure that method dispatch for is.numeric occurs correctly.

Update re @Hadley's and @DWin's comments:

EE <- new.env()
EE$var1 <- "I am var1 in EE"
EE$var2 <- "I am var2 in EE"

## Calling get directly
with(EE, lapply(c("var1", "var2"), get))
Error in FUN(c("var1", "var2")[[1L]], ...) : object 'var1' not found

## Calling get via an anonymous function
with(EE, lapply(c("var1", "var2"), function(x) get(x)))
[[1]]
[1] "I am var1 in EE"

[[2]]
[1] "I am var2 in EE"

with(EE, lapply(c("var1", "var2"), rm))
Error in FUN(c("var1", "var2")[[1L]], ...) : 
  ... must contain names or character strings

with(EE, lapply(c("var1", "var2"), function(x) rm(x)))
[[1]]
NULL

[[2]]
NULL

# var1 & var2 have now been removed
EE
<environment: 0x1154d0060>



回答2:


While @Ricardo is correct that it is safer to wrap primitive or functions that rely on method dispatch within an wrapper, here we can avoid this by setting the correct environment for get in which to search. The trick with lapply is to use sys.parent(n) (in this case n = 0 will work) to obtain the appropriate calling environments.

test.table[,lapply(grep('a',names(test.table),value=TRUE), 
                    get, envir = sys.parent(0))]

(More information can be found here Using get inside lapply, inside a function)




回答3:


It's only because data.table evaluates the j() expression (in simpler terms, everything after the first comma in DT[,...]) as an actual expression. So DT[,"Column1"] returns "Column1", just as with(DT, "Column1") returns "Column1". It's in the data table faq.

If you want, you can do:

DT[,names(test.table),with=F]


来源:https://stackoverflow.com/questions/18064602/why-do-i-need-to-wrap-get-in-a-dummy-function-within-a-j-lapply-call

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!