Why were pandas merges in python faster than data.table merges in R in 2012?

后端 未结 4 1166
攒了一身酷
攒了一身酷 2020-12-22 14:50

I recently came across the pandas library for python, which according to this benchmark performs very fast in-memory merges. It\'s even faster than the data.table package i

4条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-22 15:28

    This topic is two years old but seems like a probable place for people to land when they search for comparisons of Pandas and data.table

    Since both of these have evolved over time, I want to post a relatively newer comparison (from 2014) here for the interested users: https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping

    It would be interesting to know if Wes and/or Matt (who, by the way, are creators of Pandas and data.table respectively and have both commented above) have any news to add here as well.

    -- UPDATE --

    A comment posted below by jangorecki contains a link that I think is very useful: https://github.com/szilard/benchm-databases

    This graph depicts the average times of aggregation and join operations for different technologies (lower = faster; comparison last updated in Sept 2016). It was really educational for me.

    Going back to the question, R DT key and R DT refer to the keyed/unkeyed flavors of R's data.table and happen to be faster in this benchmark than Python's Pandas (Py pandas).

提交回复
热议问题