difference between ff and filehash package in R [closed]

人盡茶涼 提交于 2020-01-14 08:19:25

问题


I have a dataframe compose of 25 col and ~1M rows, split into 12 files, now I need to import them and then use some reshape package to do some data management. Each file is too large that I have to look for some "non-RAM" solution for importing and data processing, current I don't need to do any regression, I will have some descriptive statistics about the dataframe only.

I searched a bit and found two packages: ff and filehash, I read filehash manual first and found that it seems simple, just added some code on importing the dataframe into a file, the rest seems to be similar as usual R operations.

I haven't tried ff yet, as it comes with lots of different class, and I wonder if it worth investing time for understanding ff itself before my real work begins. But filehash package seems to be static for sometime and there's little discussion about this package, I wonder if filehash has become less popular, or even become obsolete.

Can anyone help me to choose which package to use? Or can anyone tell me what is the difference/ pros-and-cons between them? Thanks.

update 01

I am currently using filehash for importing the dataframe, and realize that it dataframe imported using filehash should be considered as readonly, as all the further modification in that dataframe will not be stored back to the file, unless you save it again, which is not very convenient in my view, as I need to remind myself to do the saving. Any comment on this?

来源:https://stackoverflow.com/questions/9918459/difference-between-ff-and-filehash-package-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!