difference between ff and filehash package in R [closed]

问题

I have a dataframe compose of 25 col and ~1M rows, split into 12 files, now I need to import them and then use some reshape package to do some data management. Each file is too large that I have to look for some "non-RAM" solution for importing and data processing, current I don't need to do any regression, I will have some descriptive statistics about the dataframe only.

I searched a bit and found two packages: ff and filehash, I read filehash manual first and found that it seems simple, just added some code on importing the dataframe into a file, the rest seems to be similar as usual R operations.

I haven't tried ff yet, as it comes with lots of different class, and I wonder if it worth investing time for understanding ff itself before my real work begins. But filehash package seems to be static for sometime and there's little discussion about this package, I wonder if filehash has become less popular, or even become obsolete.

Can anyone help me to choose which package to use? Or can anyone tell me what is the difference/ pros-and-cons between them? Thanks.

update 01

I am currently using filehash for importing the dataframe, and realize that it dataframe imported using filehash should be considered as readonly, as all the further modification in that dataframe will not be stored back to the file, unless you save it again, which is not very convenient in my view, as I need to remind myself to do the saving. Any comment on this?

来源：https://stackoverflow.com/questions/9918459/difference-between-ff-and-filehash-package-in-r

标签

import

bigdata

filehash