What tricks do people use to manage the available memory of an interactive R session? I use the functions below [based on postings by Petr Pikal and David Hinds to the r-he
I make aggressive use of the subset
parameter with selection of only the required variables when passing dataframes to the data=
argument of regression functions. It does result in some errors if I forget to add variables to both the formula and the select=
vector, but it still saves a lot of time due to decreased copying of objects and reduces the memory footprint significantly. Say I have 4 million records with 110 variables (and I do.) Example:
# library(rms); library(Hmisc) for the cph,and rcs functions
Mayo.PrCr.rbc.mdl <-
cph(formula = Surv(surv.yr, death) ~ age + Sex + nsmkr + rcs(Mayo, 4) +
rcs(PrCr.rat, 3) + rbc.cat * Sex,
data = subset(set1HLI, gdlab2 & HIVfinal == "Negative",
select = c("surv.yr", "death", "PrCr.rat", "Mayo",
"age", "Sex", "nsmkr", "rbc.cat")
) )
By way of setting context and the strategy: the gdlab2
variable is a logical vector that was constructed for subjects in a dataset that had all normal or almost normal values for a bunch of laboratory tests and HIVfinal
was a character vector that summarized preliminary and confirmatory testing for HIV.