What tricks do people use to manage the available memory of an interactive R session? I use the functions below [based on postings by Petr Pikal and David Hinds to the r-he
The use of environments instead of lists to handle collections of objects which occupy a significant amount of working memory.
The reason: each time an element of a list
structure is modified, the whole list is temporarily duplicated. This becomes an issue if the storage requirement of the list is about half the available working memory, because then data has to be swapped to the slow hard disk. Environments, on the other hand, aren't subject to this behaviour and they can be treated similar to lists.
Here is an example:
get.data <- function(x)
{
# get some data based on x
return(paste("data from",x))
}
collect.data <- function(i,x,env)
{
# get some data
data <- get.data(x[[i]])
# store data into environment
element.name <- paste("V",i,sep="")
env[[element.name]] <- data
return(NULL)
}
better.list <- new.env()
filenames <- c("file1","file2","file3")
lapply(seq_along(filenames),collect.data,x=filenames,env=better.list)
# read/write access
print(better.list[["V1"]])
better.list[["V2"]] <- "testdata"
# number of list elements
length(ls(better.list))
In conjunction with structures such as big.matrix
or data.table
which allow for altering their content in-place, very efficient memory usage can be achieved.