Manipulation of Large Files in R

后端 未结 2 1615
忘掉有多难
忘掉有多难 2020-12-05 21:20

I have 15 files of data, each around 4.5GB. Each file is a months worth of data for around 17,000 customers. All together, the data represents information on 17,000 customer

2条回答
  •  伪装坚强ぢ
    2020-12-05 22:11

    I think you already have your answer. But to reinforce it, see the official Doc

    R Data Import Export

    That states

    In general, statistical systems like R are not particularly well suited to manipulations of large-scale data. Some other systems are better than R at this, and part of the thrust of this manual is to suggest that rather than duplicating functionality in R we can make another system do the work! (For example Therneau & Grambsch (2000) commented that they preferred to do data manipulation in SAS and then use package survival in S for the analysis.) Database manipulation systems are often very suitable for manipulating and extracting data: several packages to interact with DBMSs are discussed here.

    So clearly storage of massive data is not R's primary strength, yet it provides interfaces to several tools specialized for this. In my own work, the lightweight SQLite solution is enough, even if it's a matter of preference, to some extent. Search for "drawbacks of using SQLite" and you probably won't find much to dissuade you.

    You should find SQLite's documentation pretty smooth to follow. If you have enough programming experience, doing a tutorial or two should get you going pretty quickly on the SQL front. I don't see anything overly complicated going on in your code, so the most common & basic queries such as CREATE TABLE, SELECT ... WHERE will likely meet all your needs.

    Edit

    Another advantage of using a DBMS that I didn't mention is that you can have views that make easily accessible other data organization schemas if one might say. By creating views, you can go back to the "visualization by month" without having to rewrite any table nor duplicate any data.

提交回复
热议问题