Best practices for storing and using data frames too large for memory?

前端未结

关注

 3  1707

I\'m working with a large data frame, and have run up against RAM limits. At this point, I probably need to work with a serialized version on the disk. There are a few packa

相关标签:

3条回答

走了就别回头了

2020-12-04 15:22
If you are dealing with Memory issues you should try following steps:
1. Clear additional process which consume RAM. Make sure you don't open browser with many tabs as they seem to consume a lot of RAM.
2. After done with step1, understand the structure of your dataset file. For that purpose, use read.csv(nrow=100). By doing this you will come to know what are columns and column structure. If you find any column not useful then remove it.
3. Once you know the column structure (colclasses) you can import the entire dataframe in one go..
Here is the sample code:
```
initial <- read.table("datatable.txt", nrows = 100)
classes <- sapply(initial, class)
tabAll <- read.table("datatable.txt", colClasses = classes)
```
1. Use fread() to read large data-frames.
2. If still it doesn't solve the problem then segment the dataset into two parts divide number of rows into two equal parts and then merge them after applying Dimensionality reduction Technique.
I hope it helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2020-12-04 15:28
You probably want to look at these packages:
- ff for 'flat-file' storage and very efficient retrieval (can do data.frames; different data types)
- bigmemory for out-of-R-memory but still in RAM (or file-backed) use (can only do matrices; same data type)
- biglm for out-of-memory model fitting with lm() and glm()-style models.
and also see the High-Performance Computing task view.
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-12-04 15:40

I would say the disk.frame is good candidate for these type of tasks. I am the primary author of the package.

Unlike ff and bigmemory which restricts what data types can be easily handled, it tries to "mimic" data.frames and provide dplyr verbs for manipulating the data.

0 讨论(0)
发布评论:

提交评论
- 加载中...