Cannot release memory with knitr

我怕爱的太早我们不能终老 提交于 2020-01-15 12:25:08

问题


I have an issue with knitr where I run can the code in the console without a problem but run out of memory when I knit the document. The markdown document is similar to

---
title: "xyz"
output: 
  html_document: 
    toc: true
date: "`r format(Sys.time(), '%d %B, %Y')`"
author: Me
bibliography: ../ref.bib
---
```{r setup, include = FALSE, cache = FALSE}
options(width = 100, digits = 3, scipen = 8)
knitr::opts_chunk$set(
  error = FALSE, cache = FALSE, 
  cache.path = "some-path-cache/", fig.path = "some-path-fig/", 
  warnings = TRUE, message = TRUE, dpi = 128, cache.lazy = FALSE)
```

[some code]

```{r load_dat}
big_dat <- func_to_get_big_dat()
some_subset <- func_to_get_subset()
```

[some code where both big_dat and some_subset is used, some objects are assigned and some are subsequently removed with rm]

```{r reduce_mem}
dat_fit <- big_dat[some_subset, ]
rm(big_dat)
```
```{r log_to_show}
sink("some-log-file")
print(gc())
print(sapply(ls(), function(x) paste0(class(get(x)), collapse = ";")))
print(sort(sapply(ls(), function(x) object.size(get(x)))))
sink()
```
```{r some_chunk_that_requires_a_lot_of_memory, cache = 1}
...
```

When I knit the document using knitr then I run out of memory in the some_chunk_that_requires_a_lot_of_memory and the content of some-log-file is

            used (Mb) gc trigger (Mb)  max used (Mb)
Ncells   3220059  172    5684620  304   5684620  304
Vcells 581359200 4436 1217211123 9287 981188369 7486    
[output abbreviated (the other variables are "function"s, "character"s, and "matrix"s] 
     dat_fit          X1   some_subset     
"data.frame"   "integer"     "integer"
[output abbreviated]
     X1   some_subset     dat_fit 
5235568       5235568   591631352

so the objects in the .GlobalEnv far from sums to the 4436 MB (there are not many objects and they far smaller than 50 MB each). Running the code in the console does not yield any issues and the print(gc()) shows a much smaller figure.

My questions are

  1. Can I do something to figure out why I use much more memory when I knit the document? Clearly, there must be assigned some objects somewhere that takes up a lot of space. Can I find all assigned objects and check their size?
  2. Do you have some suggestion why gc release less memory when I knit the document? Is there somewhere were knitr assigns some object that may take up a lot of memory?

The data set is proprietary and I have tried but failed to make small example where I can reproduce the result. As a note, I do cache some output from some chunks between load_dat and reduce_mem. I use cache.lazy = FALSE to avoid this issue. Here is my sessionInfo

library(knitr)
sessionInfo()
#R R version 3.4.2 (2017-09-28)
#R Platform: x86_64-w64-mingw32/x64 (64-bit)
#R Running under: Windows 7 x64 (build 7601) Service Pack 1
#R 
#R Matrix products: default
#R 
#R locale:
#R [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
#R [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
#R 
#R attached base packages:
#R [1] stats     graphics  grDevices utils     datasets  methods   base     
#R 
#R other attached packages:
#R [1] knitr_1.17
#R 
#R loaded via a namespace (and not attached):
#R [1] compiler_3.4.2 tools_3.4.2    yaml_2.1.16 

Regarding question 1.

I also added the following to the log_to_show chunk to figure out if there are objects in other environments in the session that takes up a lot of space

# function to check if `this_env` is in `l`
is_env_in_list <- function(l, this_env){
  for(i in l)
    if(identical(i, this_env))
      return(TRUE)

  FALSE
}

# remove duplicates for environments 
remove_dup_envs <- function(objs){
  do_drop <- logical(length(objs))
  for(j in rev(seq_along(objs))){
    for(i in seq_len(j - 1L)){
      if(identical(objs[[i]], objs[[j]])){
        do_drop[j] <- TRUE
        break
      }
    }
  }
  objs[!do_drop]
}

# attempt to write function to get all unique environments 
get_env <- function(this_env = .GlobalEnv, out = NULL, only_new = FALSE){
  if(is_env_in_list(out, this_env))
    return(if(only_new) NULL else out)

  if(identical(this_env, emptyenv()))
    return(if(only_new) NULL else out)
  new. <- this_env # not emptyenv or in list so we add it

  # add parent env
  p_env <- parent.env(this_env)
  if(!is_env_in_list(out, p_env))
    new. <- c(new., get_env(p_env, out, only_new  = only_new))

  # look through assigned objects, find enviroments and add these
  objs <- lapply(ls(envir = this_env), function(x){
    o <- try(get(x, envir = this_env), silent = TRUE)
    if(inherits(o, "try-error"))
      NULL
    o
  })
  objs <- lapply(objs, function(x){
    if(is.function(x) && !is.null(environment(x)))
      return(environment(x))
    x
  })
  if(length(objs) == 0)
    return(if(only_new) new. else remove_dup_envs(c(new., out)))

  is_env <- which(sapply(objs, is.environment))
  if(length(is_env) == 0)
    return(if(only_new) new. else remove_dup_envs(c(new., out)))

  objs <- remove_dup_envs(objs[is_env])
  keep <- which(!sapply(objs, is_env_in_list, l = c(new., out)))
  if(length(keep) == 0L)
    return(if(only_new) new. else c(new., out))

  objs <- objs[keep]
  for(o in objs){
    ass_envs <- get_env(o, out = c(new., out), only_new = TRUE)
    new. <- c(new., ass_envs)
  }

  return(if(only_new) new. else remove_dup_envs(c(new., out)))
}

tmp <- get_env(asNamespace("knitr"))
names(tmp) <- sapply(tmp, environmentName)
print(tmp <- tmp[order(names(tmp))])
out <- lapply(tmp, function(x){
  o <- sapply(ls(envir = x), function(z){
    r <- try(object.size(get(z, envir = x)), silent = TRUE)
    if(inherits(r, "try-error"))
      return(0)
    r
  })
  if(length(o) == 0L)
    return(NULL)
  tail(sort(o))
})
max_val <- sapply(out, max)
keep <- which(max_val > 10^7)
out <- out[keep]
max_val <- max_val[keep]
tmp <- tmp[keep]

ord <- order(max_val)
print(tmp <- tmp[ord])
print(out <- out[ord])

It shows no objects that are larger than dat_fit.

来源:https://stackoverflow.com/questions/52259676/cannot-release-memory-with-knitr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!