How much data can R handle? [closed]

此生再无相见时 提交于 2019-12-20 19:43:13

问题


By "handle" I mean manipulate multi-columnar rows of data. How does R stack up against tools like Excel, SPSS, SAS, and others? Is R a viable tool for looking at "BIG DATA" (hundreds of millions to billions of rows)? If not, which statistical programming tools are best suited for analysis large data sets?


回答1:


If you look at the High-Performance Computing Task View on CRAN, you will get a good idea of what R can do in a sense of high performance.




回答2:


You can in principal store as much data as you have RAM with the exception that, currently, vectors and matrices are restricted to 2^31 - 1 elements because R uses 32-bit indexes on vectors. General vectors (lists, and their derivative data frames) are restricted to 2^31 - 1 components, and each of those components has the same restrictions as vectors/matrices/lists/data.frames etc.

Of course these are theoretical limits, if you want to do anything with data in R it will inevitably require space to hold a couple of copies at least, as R will usually copy data passed in to functions etc.

There are efforts to allow on disk storage (rather than in RAM); but even those will be restricted to the 2^31-1 restrictions mentioned above in use in R at any one time. See the Large memory and out-of-memory data section of the High Performance Computing Task View linked to in @Roman's post.




回答3:


Perhaps a good indication of its suitability for "big data" is the fact that R has emerged as the platform of choice for developers competing in Kaggle.com data modeling competitions. See the article on the Revolution Analytics website -- R beats out SAS and SPSS by a healthy margin. What R lacks in out of the box number crunching power it apparently makes up for in flexibility.

In addition to what's available on the web there are several new books for how to hot-rod R for tackling big data. The Art of R Programming (Matloff 2011; No Starch Press) provide introductions to writing optimized R code, parallel computing, and using R in conjunction with C. The entire book is well-written with great code samples and walk-throughs. Parallel R (McCallum & Weston 2011; O'Reilly) looks good too.




回答4:


I'll explain my short story with R and big data set.
I had a connector from R to RDBMS,

  • where I stored 80mln compounds.

I've build a queries which gathered some subset of this data.
Then manipulate on this subset.
R was simply choking with more than 200k rows in memory on my PC.

  • core duo
  • 4 GB ram

So working on some appropriate subset for machine is good approach.



来源:https://stackoverflow.com/questions/5527850/how-much-data-can-r-handle

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!