Why does lapply() not retain my data.table keys?

痴心易碎 提交于 2020-01-01 04:24:09

问题


I have a bunch of data.tables in a list. I want to apply unique() to each data.table in my list, but doing so destroys all my data.table keys.

Here's an example:

A <- data.table(a = rep(c("a","b"), each = 3), b = runif(6), key = "a")
B <- data.table(x = runif(6), b = runif(6), key = "x")

blah <- unique(A)

Here, blah still has a key, and everything is right in the world:

key(blah)

# [1] "a"

But if I add the data.tables to a list and use lapply(), the keys get destroyed:

dt.list <- list(A, B)

unique.list <- lapply(dt.list, unique) # Keys destroyed here

lapply(unique.list, key) 

# [[1]]
# NULL

# [[2]]
# NULL

This probably has to do with me not really understanding what it means for keys to be assigned "by reference," as I've had other problems with keys disappearing.

So:

  • Why does lapply not retain my keys?
  • What does it mean to say keys are assigned "by reference"?
  • Should I even be storing data.tables in a list?
  • How can I safely store/manipulate data.tables without fear of losing my keys?

EDIT:

For what it's worth, the dreaded for loop works just fine, too:

unique.list <- list()

for (i in 1:length(dt.list)) {
  unique.list[[i]] <- unique(dt.list[[i]])
}

lapply(unique.list, key)

# [[1]]
# [1] "a"

# [[2]]
# [1] "x"

But this is R, and for loops are evil.


回答1:


Interestingly, notice the difference between these two different results

lapply(dt.list, unique) 
lapply(dt.list, function(x) unique(x)) 

If you use the latter, the results are as you would expect.


The seemingly unexpected behavior is due to the fact that the first lapply statement is invoking unique.data.frame (ie from {base}) while the second is invoking unique.data.table




回答2:


Good question. It turns out that it's documented in ?lapply (see Note section) :

For historical reasons, the calls created by lapply are unevaluated, and code has been written (e.g. bquote) that relies on this. This means that the recorded call is always of the form FUN(X[[0L]], ...), with 0L replaced by the current integer index. This is not normally a problem, but it can be if FUN uses sys.call or match.call or if it is a primitive function that makes use of the call. This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required in R 2.7.1 to ensure that method dispatch for is.numeric occurs correctly.



来源:https://stackoverflow.com/questions/14928278/why-does-lapply-not-retain-my-data-table-keys

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!