How to load data quickly into R?

前端 未结 4 1757
天命终不由人
天命终不由人 2020-12-02 07:23

I have some R scripts, where I have to load several dataframe in R as quickly as possible. This is quite important as reading the data is the slowest part of the procedure.

4条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-02 08:08

    It depends on what you want to do and how you process the data further. In any case, loading from a binary R object is always going to be faster, provided you always need the same dataset. The limiting speed here is the speed of your harddrive, not R. The binary form is the internal representation of the dataframe in the workspace, so there is no transformation needed anymore.

    Any kind of text file is a different story, as you include invariably an overhead : each time you read in the text file, the data has to be transformed to the binary R object. I'd forget about them. They are only useful for porting datasets from one application to another.

    Setting up a MySQL backend is very useful if you need different parts of the data, or different subsets in different combinations. Especially when working with huge datasets, the fact that you don't have to load in the whole dataset before you can start selecting the rows/columns, can gain you quite some time. But this only works with huge datasets, as reading a binary file is quite a bit faster than searching a database.

    If the data is not too big, you can save different dataframes in one RData file, giving you the opportunity to streamline things a bit more. I often have a set of dataframes in a list or in a seperate environment (see also ?environment for some simple examples). This allows for lapply / eapply solutions to process multiple dataframes at once.

提交回复
热议问题