saveRDS inflating size of object

倖福魔咒の 提交于 2019-12-03 06:53:56

It took a bit of digging but I did actually find a solution in the end.

It turns out it was the lm model objects that were the guilty party. Based on this very helpful article:

https://blogs.oracle.com/R/entry/is_the_size_of_your

It turns out that the lm.object$terms component includes a an environment component that references to the objects present in the global environment when the model was built. Under certain circumstances, when you saveRDS R will try and draw in the environmental objects into the save object.

As I had ~0.5GB sitting in the global environment and an list array of ~200 lm model objects, this caused the RDS object to inflate dramatically as it was actually trying to compress ~100GB of data.

To test if this is what's causing the problem. Execute the following code:

as.matrix(lapply(lm.object, function(x) length(serialize(x,NULL)))) 

This will tell you if the $terms component is inflating.

The following code will remove the environmental references from the $terms component:

rm(list=ls(envir = attr(lm.object$terms, ".Environment")), envir = attr(lm.object$terms, ".Environment")) 

Be warned though it'll also remove all the global environmental objects it references.

For model objects you could also simply delete the reference to the environment.

As for example like this

ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group) 

attr(lm.D9$terms, ".Environment") <- NULL
saveRDS(lm.D9, file = "path_to_save.RDS")

This unfortunatly breaks the model - but you can add an environment manualy after loading again.

readRDS("path_to_save.RDS")
attr(lm.D9$terms, ".Environment") <- globalenv()

This helped me in my specific use case and looks a bit saver to me...

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!