Fastest way to subset - data.table vs. MySQL

前端未结

关注

 2  926

情话喂你 2020-12-14 08:45

I\'m an R user, and I frequently find that I need to write functions that require subsetting large datasets (10s of millions of rows). When I apply such functions over a la

2条回答

温柔的废话 (楼主)

2020-12-14 09:02

If the data fits in RAM, data.table is faster. If you provide an example it will probably become evident, quickly, that you're using data.table badly. Have you read the "do's and don'ts" on the data.table wiki?

SQL has a lower bound because it is a row store. If the data fits in RAM (and 64bit is quite a bit) then data.table is faster not just because it is in RAM but because columns are contiguous in memory (minimising page fetches from RAM to L2 for column operations). Use data.table correctly and it should be faster than SQL's lower bound. This is explained in FAQ 3.1. If you're seeing slower with data.table, then chances are very high that you're using data.table incorrectly (or there's a performance bug that we need to fix). So, please post some tests, after reading the data.table wiki.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...