This seems as fread
bug, but I am not sure.
This example reproduce my problem. I have a function where I read a data.table and return it in a list. i us
Arun's answer is a great explanation. The specific feature of list()
in R <= 3.0.2 is that it copies named inputs (things that have been named before the call to list()
). In r-devel now (the next version of R), this copy by list()
no longer happens and all will be well. It's a very welcome change in R.
In the meantime, you can work around it by creating the output list in a different way.
> R.version.string
[1] "R version 3.0.2 (2013-09-25)"
First demonstrate list() copying :
> DT = data.table(a=1:3)
> address(DT)
[1] "0x1d70010"
> address(list(DT)[[1]])
[1] "0x21bc178" # different address => list() copied the data.table named DT
> data.table:::selfrefok(DT)
[1] 1
> data.table:::selfrefok(list(DT)[[1]])
[1] 0 # i.e. this copied DT is not over-allocated
Now a different way to create the same list :
> ans = list()
> ans$DT = DT # use $<- instead
> address(DT)
[1] "0x1d70010"
> address(ans$DT)
[1] "0x1d70010" # good, no copy
> identical(ans, list(DT=DT))
[1] TRUE
> data.table:::selfrefok(ans$DT)
[1] 1 # good, the list()-ed DT is still over-allocated ok
Convoluted and confusing, I know. Using $<-
to create the output list, or even just placing the call to fread
inside the call to list()
i.e. list(DT=fread(...))
should avoid the copy by list()
.