Memory profiling in R: how to find the place of maximum memory usage?

别等时光非礼了梦想. 提交于 2019-12-06 10:47:23

问题


My code eats up to 3GB of memory at a single time. I figured it out using gc():

gc1 <- gc(reset = TRUE)
graf(...) # the code
gc2 <- gc()
cat(sprintf("mem: %.1fMb.\n", sum(gc2[,6] - gc1[,2])))
# mem: 3151.7Mb.

Which I guess means that there is one single time, when 3151.7 MB are allocated at once.

My goal is to minimize the maximum memory allocated at any single time. How do I figure out which part of my code is reposponsible for the maximum usage of those 3GB of memory? I.e. the place where those 3GB are allocated at once.

  1. I tried memory profiling with Rprof and profvis, but both seem to show different information (which seems undocumented, see my other question). Maybe I need to use them with different parameters (or use different tool?).

  2. I've been looking at Rprofmem... but:

    • in the profmem vignette they wrote: "with utils::Rprofmem() it is not possible to quantify the total memory usage at a given time because it only logs allocations and does therefore not reflect deallocations done by the garbage collector."
    • how to output the result of Rprofmem? This source speaks for itself: "Summary functions for this output are still being designed".

回答1:


My code eats up to 3GB of memory at a single time.

While it looks like your code is consuming a lot of RAM at once by calling one function you can break down the memory consumption by looking into the implementation details of the function (and its sub calls) by using RStudio's built-in profiling (based on profvis) to see the execution time and rough memory consumption. Eg. if I use my demo code:

  # graf code taken from the tutorial at
  # https://rawgit.com/goldingn/intecol2013/master/tutorial/graf_workshop.html
  library(dismo)  # install.packages("dismo")
  library(GRaF)   # install_github('goldingn/GRaF')

  data(Anguilla_train)

  # loop to call the code under test several times to get better profiling results
  for (i in 1:5) {

    # keep the first n records of SegSumT, SegTSeas and Method as covariates
    covs <- Anguilla_train[, c("SegSumT", "SegTSeas", "Method")]

    # use the presence/absence status to fit a simple model
    m1 <- graf(Anguilla_train$Angaus, covs)
  }

Start profiling with the Profile > Start Profiling menu item, source the above code and stop the profiling via the above menu.

After Profile > Stop Profiling RStudio is showing the result as Flame Graph but what you are looking for is hidden in the Data tab of the profile result (I have unfolded all function calls which show heavy memory consumption):

The numbers in the memory column indicate the memory allocated (positive) and deallocated (negative numbers) for each called function and the values should include the sum of the whole sub call tree + the memory directly used in the function.

My goal is to minimize the maximum memory allocated at any single time.

Why do you want to do that? Do you run out-of-memory or do you suspect that repeated memory allocation is causing long execution times?

High memory consumption (or repeated allocations/deallocations) often come together with a slow execution performance since copying memory costs time.

So look at the Memory or Time column depending on your optimization goals to find function calls with high values.

If you look into the source code of the GRaF package you can find a loop in the graf.fit.laplace function (up to 50 "newton iterations") that calls "slow" R-internal functions like chol, backsolve, forwardsolve but also slow functions implemented in the package itself (like cov.SE.d1).

Now you can try to find faster (or less memory consuming) replacements for these functions... (sorry, I can't help here).

PS: profvis uses Rprof internally so the profiling data is collected by probing the current memory consumption in regular time intervals and counting it for the currently active function (call stack).

Rprof has limitations (mainly not an exact profiling result since the garbage collector triggers at non-deterministic times and the freed memory is attributed to the function the next probing interval break stops at and it does not recognize memory allocated directly from the OS via C/C++ code/libraries that bypasses R's memory management API). Still it is the easiest and normally good enough indication of memory and performance problems...

For an introduction into profvis see: For https://rstudio.github.io/profvis/



来源:https://stackoverflow.com/questions/58250531/memory-profiling-in-r-how-to-find-the-place-of-maximum-memory-usage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!