How to read a subset of large dataset in R?

…衆ロ難τιáo~ 提交于 2019-12-18 04:39:06

问题


I have a dataset with about 2 million rows, so without reading the whole dataset I want to read a subset of dataset . My dataset contains a date column in it so I just want to read dataset between a date range without reading whole dataset as it will be time consuming and memory waste. so how to accomplish it can anyone guide me on this ?


回答1:


Use skip= parameter in read.table

read.table("file.txt",skip= ,nrows= )

Both the skip= and nrows= take in row indicator numbers so just add them after the=.

The nrows= defines how deep you range when you are importing the file.

I suggest reading https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html if you haven't done so already.

Also, please see one of my questions:

R - Reading lines from a .txt-file after a specific line

It, somewhat, touches the same subject.

The other possible way might be to use grep() in skip=

read.table(...,skip=grep("2005-12-31", readLines("File.txt")),nrows=365)

What this line does is it skips until it finds the line depicted in grep() and reads the lines after that. The nrow= will stop the reading after it has read 365 lines (this way you have read one year of dates provided one line equals one date).

This seems kinda complicated, but it's the only way I know how to solve this.



来源:https://stackoverflow.com/questions/25932628/how-to-read-a-subset-of-large-dataset-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!