R: How to quickly read large .dta files without RAM Limitations

百般思念 提交于 2019-11-30 17:52:59

问题


I have a 10 GB .dta Stata file and I am trying to read it into 64-bit R 3.3.1. I am working on a virtual machine with about 130 GB of RAM (4 TB HD) and the .dta file is about 3 million rows and somewhere between 400 and 800 variables.

I know data.table() is the fastest way to read in .txt and .csv files, but does anyone have a recommendation for reading largeish .dta files into R? Reading the file into Stata as a .dta file requires about 20-30 seconds, although I need to set my working memory max prior to opening the file (I set the max at 100 GB).

I have not tried importing to .csv in Stata, but I hope to avoid touching the file with Stata. A solution is found via Using memisc to import stata .dta file into R but this assumes RAM is scarce. In my case, I should have sufficient RAM to work with the file.


回答1:


The fastest way to load a large Stata dataset in R is using the readstata13 package. I have compared the performance of foreign, readstata13, and haven packages on a large dataset in this post and the results repeatedly showed that readstata13 is the fastest available package for reading Stata dataset in R.




回答2:


I recommend the haven R package. Unlike foreign, It can read the latest Stata formats:

library(haven)
data <- read_dta('myfile.dta')

Not sure how fast it is compared to other options, but your choices for reading Stata files in R are rather limited. My understanding is that haven wraps a C library, so it's probably your fastest option.



来源:https://stackoverflow.com/questions/38820594/r-how-to-quickly-read-large-dta-files-without-ram-limitations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!