问题
I have a 10 GB .dta Stata file and I am trying to read it into 64-bit R 3.3.1. I am working on a virtual machine with about 130 GB of RAM (4 TB HD) and the .dta file is about 3 million rows and somewhere between 400 and 800 variables.
I know data.table() is the fastest way to read in .txt and .csv files, but does anyone have a recommendation for reading largeish .dta files into R? Reading the file into Stata as a .dta file requires about 20-30 seconds, although I need to set my working memory max prior to opening the file (I set the max at 100 GB).
I have not tried importing to .csv in Stata, but I hope to avoid touching the file with Stata. A solution is found via Using memisc to import stata .dta file into R but this assumes RAM is scarce. In my case, I should have sufficient RAM to work with the file.
回答1:
The fastest way to load a large Stata dataset in R is using the readstata13
package. I have compared the performance of foreign
, readstata13
, and haven
packages on a large dataset in this post and the results repeatedly showed that readstata13
is the fastest available package for reading Stata dataset in R.
回答2:
I recommend the haven R package. Unlike foreign
, It can read the latest Stata formats:
library(haven)
data <- read_dta('myfile.dta')
Not sure how fast it is compared to other options, but your choices for reading Stata files in R are rather limited. My understanding is that haven
wraps a C library, so it's probably your fastest option.
来源:https://stackoverflow.com/questions/38820594/r-how-to-quickly-read-large-dta-files-without-ram-limitations