assigning by reference into loaded package datasets

前端 未结 2 889
野性不改
野性不改 2020-12-03 07:40

I am in the process of creating a package that uses a data.table as a dataset and has a couple of functions which assign by reference using :=.

相关标签:
2条回答
  • 2020-12-03 08:18

    Another solution is to use inst/extdata to save the rda file (which would contain any number of data.table objects) and have a file DT.r within the data subdirectory

    # get the environment from the call to `data()`
    env <- get('envir', parent.frame(1))
    # load the data
    load(system.file('extdata','DT.rda', package= 'foo'), envir = env)
    # overallocate (evaluating in correct environment)
    if(require(data.table)){
    # the contents of `DT.rda` are known, so write out in full
      evalq(alloc.col(DT), envir = env)
    
    }
    # clean up so `env` object not present in env environment after calling `data(DT)`
    rm(list = c('env'), envir = env)
    
    
    
    }
    
    0 讨论(0)
  • 2020-12-03 08:25

    This has nothing to do with datasets or locking -- you can reproduce it simply using

    DT<-unserialize(serialize(data.table(b = 1:5),NULL))
    foo(DT)
    DT
    

    I suspect it has to do with the fact that data.table has to re-create the extptr inside the object on the first access on DT, but it's doing so on a copy so there is no way it can share the modification with the original in the global environment.


    [From Matthew] Exactly.

    DT<-unserialize(serialize(data.table(b = 1:3),NULL))
    DT
       b
    1: 1
    2: 2
    3: 3
    DT[,newcol:=42]
    DT                 # Ok. DT rebound to new shallow copy (when direct)
       b newcol
    1: 1     42
    2: 2     42
    3: 3     42
    
    DT<-unserialize(serialize(data.table(b = 1:3),NULL))
    foo(DT)
       b a
    1: 1 1
    2: 2 1
    3: 3 1
    DT                 # but not ok when via function foo()
       b
    1: 1
    2: 2
    3: 3
    


    DT<-unserialize(serialize(data.table(b = 1:3),NULL))
    alloc.col(DT)      # alloc.col needed first
       b
    1: 1
    2: 2
    3: 3
    foo(DT)
       b a
    1: 1 1
    2: 2 1
    3: 3 1
    DT                 # now it's ok
       b a
    1: 1 1
    2: 2 1
    3: 3 1
    

    Or, don't pass DT into the function, just refer to it directly. Use data.table like a database: a few fixed name tables in .GlobalEnv.

    DT <- unserialize(serialize(data.table(b = 1:5),NULL))
    foo <- function() {
       DT[, newcol := 7]
    }
    foo()
       b newcol
    1: 1      7
    2: 2      7
    3: 3      7
    4: 4      7
    5: 5      7
    DT              # Unserialized data.table now over-allocated and updated ok.
       b newcol
    1: 1      7
    2: 2      7
    3: 3      7
    4: 4      7
    5: 5      7
    
    0 讨论(0)
提交回复
热议问题