I\'m working with a large data frame, and have run up against RAM limits. At this point, I probably need to work with a serialized version on the disk. There are a few packa
If you are dealing with Memory issues you should try following steps:
Clear additional process which consume RAM. Make sure you don't open browser with many tabs as they seem to consume a lot of RAM.
After done with step1, understand the structure of your dataset file. For that purpose, use read.csv(nrow=100). By doing this you will come to know what are columns and column structure. If you find any column not useful then remove it.
Once you know the column structure (colclasses) you can import the entire dataframe in one go..
Here is the sample code:
initial <- read.table("datatable.txt", nrows = 100)
classes <- sapply(initial, class)
tabAll <- read.table("datatable.txt", colClasses = classes)
Use fread() to read large data-frames.
If still it doesn't solve the problem then segment the dataset into two parts divide number of rows into two equal parts and then merge them after applying Dimensionality reduction Technique.
I hope it helps.
You probably want to look at these packages:
lm()
and glm()
-style models.and also see the High-Performance Computing task view.
I would say the disk.frame is good candidate for these type of tasks. I am the primary author of the package.
Unlike ff
and bigmemory
which restricts what data types can be easily handled, it tries to "mimic" data.frame
s and provide dplyr
verbs for manipulating the data.