when is plyr better than data.table? [closed]

流过昼夜 提交于 2019-12-02 11:51:32

They are different packages with different purposes. One is not a substitute for the other, despite there being a small subset of functionality for which they overlap.

Here is the brief summary of each package, from the packages themselves:

The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.

and

data.table ... offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix.

Where they overlap is in the "fast grouping" which plyr also does by splitting data.frames, operating on pieces, and recombining them into a single data.frame. data.table has many other features which make operations on data.frame like structures fast; plyr has features which apply the split-apply-combine paradigm to other data structures such as lists and arrays (both as inputs and outputs).

So, really, they are two different tools that happen to have a small area of overlap which address the same problem domain, but each does much more than that and if you want/need that additional functionality, then that package should be used.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!