Why were pandas merges in python faster than data.table merges in R in 2012?

后端未结

关注

 4  1166

攒了一身酷 2020-12-22 14:50

I recently came across the pandas library for python, which according to this benchmark performs very fast in-memory merges. It\'s even faster than the data.table package i

4条回答

谎友^ (楼主)

2020-12-22 15:28

This topic is two years old but seems like a probable place for people to land when they search for comparisons of Pandas and data.table

Since both of these have evolved over time, I want to post a relatively newer comparison (from 2014) here for the interested users: https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping

It would be interesting to know if Wes and/or Matt (who, by the way, are creators of Pandas and data.table respectively and have both commented above) have any news to add here as well.

-- UPDATE --

A comment posted below by jangorecki contains a link that I think is very useful: https://github.com/szilard/benchm-databases

This graph depicts the average times of aggregation and join operations for different technologies (lower = faster; comparison last updated in Sept 2016). It was really educational for me.

Going back to the question, R DT key and R DT refer to the keyed/unkeyed flavors of R's data.table and happen to be faster in this benchmark than Python's Pandas (Py pandas).

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...