Taking a data.table slice with a sequence of (row,col) indices

问题

I have a data.table that resembles the one below.

tab <- data.table(a = c(NA, 42190, NA), b = c(42190, 42190, NA), c = c(40570, 42190, NA))
tab
       a     b     c
1:    NA 42190 40570
2: 42190 42190 42190
3:    NA    NA    NA

Upon specification of a vector of row indices, and a vector of column indices, I would like a vector returned containing the points in tab corresponding to the specified vector of row indices and column indices.

For example, suppose I wanted to get the diagonal elements in tab. I would specify two vectors,

ri <- 1:3
ci <- 1:3

and some function, function(ri, ci, tab), would return the diagonal elements of tab.

If tab were a data.frame, I would do what's below,

as.data.frame(tab)[cbind(ri, ci)]

but, I would like to avoid data.frame syntax. I would also like to avoid a for loop, as this tends to be slow.

回答1:

There is a faster way to do this than coercing to either matrix or data.frame. Just use the [data.frame function.

`[.data.frame`( tab,  cbind(ri,ci) )
[1]    NA 42190    NA

This is the functional syntax for the [.data.frame function.

回答2:

(UPDATE: @42-'s answer using [.data.frame is best. But here's my previous answer)

as.matrix(tab)[cbind(ri, ci)]

is going to be faster and more memory-efficient than melt.

I see no reason you don't declare your DT as a matrix, as @thelatemail recommends. This is one case where DT syntax is not as powerful as matrix.

(For memory-efficiency with large DTs, data.table has commands setDF/setDT to allow converting to/from DF/DT without copying, but I'm not aware it has an equivalent for matrix. If that is something people do a lot of, it might make a good enhance request for DT.

For really big dimensions, you might look into Matrix's sparse-matrix formats package), or chunk your data, or use disk-backed data structures.)

来源：https://stackoverflow.com/questions/50635084/taking-a-data-table-slice-with-a-sequence-of-row-col-indices

标签

indexing

data.table

matrix-indexing