saveRDS inflating size of object

后端 未结 2 964
难免孤独
难免孤独 2020-12-30 09:55

This is a tricky one as I can\'t provide a reproducible example, but I\'m hoping that others may have had experience dealing with this.

Essentially I have a function

相关标签:
2条回答
  • 2020-12-30 10:04

    It took a bit of digging but I did actually find a solution in the end.

    It turns out it was the lm model objects that were the guilty party. Based on this very helpful article:

    https://blogs.oracle.com/R/entry/is_the_size_of_your

    It turns out that the lm.object$terms component includes a an environment component that references to the objects present in the global environment when the model was built. Under certain circumstances, when you saveRDS R will try and draw in the environmental objects into the save object.

    As I had ~0.5GB sitting in the global environment and an list array of ~200 lm model objects, this caused the RDS object to inflate dramatically as it was actually trying to compress ~100GB of data.

    To test if this is what's causing the problem. Execute the following code:

    as.matrix(lapply(lm.object, function(x) length(serialize(x,NULL)))) 
    

    This will tell you if the $terms component is inflating.

    The following code will remove the environmental references from the $terms component:

    rm(list=ls(envir = attr(lm.object$terms, ".Environment")), envir = attr(lm.object$terms, ".Environment")) 
    

    Be warned though it'll also remove all the global environmental objects it references.

    0 讨论(0)
  • 2020-12-30 10:07

    For model objects you could also simply delete the reference to the environment.

    As for example like this

    ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
    trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
    group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
    weight <- c(ctl, trt)
    lm.D9 <- lm(weight ~ group) 
    
    attr(lm.D9$terms, ".Environment") <- NULL
    saveRDS(lm.D9, file = "path_to_save.RDS")
    

    This unfortunatly breaks the model - but you can add an environment manualy after loading again.

    readRDS("path_to_save.RDS")
    attr(lm.D9$terms, ".Environment") <- globalenv()
    

    This helped me in my specific use case and looks a bit saver to me...

    0 讨论(0)
提交回复
热议问题