I recently discovered binary search in data.table. If the table is sorted on multiple keys it possible to search on the 2nd key only ?
DT = dat
Based on this email thread I wrote the following functions:
create_index = function(dt, ..., verbose = getOption("datatable.verbose")) {
cols = data.table:::getdots()
res = dt[, cols, with=FALSE]
res[, i:=1:nrow(dt)]
setkeyv(res, cols, verbose = verbose)
}
JI = function(index, ...) {
index[J(...),i]$i
}
Here are the results on my system with a larger DT (1e8 rows):
> system.time(DT[J("c")])
user system elapsed
0.168 0.136 0.306
> system.time(DT[J(unique(x), 25)])
user system elapsed
2.472 1.508 3.980
> system.time(DT[y==25])
user system elapsed
4.532 2.149 6.674
> system.time(IDX_y <- create_index(DT, y))
user system elapsed
3.076 2.428 5.503
> system.time(DT[JI(IDX_y, 25)])
user system elapsed
0.512 0.320 0.831
If you are using the index multiple times it is worth it.