For two logical vectors, x
and y
, of length > 1E8, what is the fastest way to calculate the 2x2 cross tabulations?
I suspect the answer is to w
Here are results for the logical method, table
, and bigtabulate
, for N = 3E8:
test replications elapsed relative user.self sys.self
2 logical 1 23.861 1.000000 15.36 8.50
3 bigtabulate 1 36.477 1.528729 28.04 8.43
1 table 1 184.652 7.738653 150.61 33.99
In this case, table
is a disaster.
For comparison, here is N = 3E6:
test replications elapsed relative user.self sys.self
2 logical 1 0.220 1.000000 0.14 0.08
3 bigtabulate 1 0.534 2.427273 0.45 0.08
1 table 1 1.956 8.890909 1.87 0.09
At this point, it seems that writing one's own logical functions is best, even though that abuses sum
, and examines each logical vector multiple times. I've not yet tried compiling the functions, but that should yield better results.
Update 1 If we give bigtabulate
values that are already integers, i.e. if we do the type conversion 1 * cbind(v1,v2)
outside of bigtabulate, then the N=3E6 multiple is 1.80, instead of 2.4. The N=3E8 multiple relative to the "logical" method is only 1.21, instead of 1.53.
Update 2
As Joshua Ulrich has pointed out, converting to bit vectors is a significant improvement - we're allocating and moving around a LOT less data: R's logical vectors consume 4 bytes per entry ("Why?", you may ask... Well, I don't know, but an answer may turn up here.), whereas a bit vector consumes, well, one bit, per entry - i.e. 1/32 as much data. So, x
consumes 1.2e9 bytes, while xb
(the bit version in the code below) consumes only 3.75e7 bytes.
I've dropped table
and the bigtabulate
variations from the updated benchmarks (N=3e8). Note that logicalB1
assumes that the data is already a bit vector, while logicalB2
is the same operation with the penalty for type conversion. As my logical vectors are the results of operations on other data, I don't have the benefit of starting off with a bit vector. Nonetheless, the penalty to be paid is relatively small. [The "logical3" series only performs 3 logical operations, and then does a subtraction. Since it's cross-tabulation, we know the total, as DWin has remarked.]
test replications elapsed relative user.self sys.self
4 logical3B1 1 1.276 1.000000 1.11 0.17
2 logicalB1 1 1.768 1.385580 1.56 0.21
5 logical3B2 1 2.297 1.800157 2.15 0.14
3 logicalB2 1 2.782 2.180251 2.53 0.26
1 logical 1 22.953 17.988245 15.14 7.82
We've now sped this up to taking only 1.8-2.8 seconds, even with many gross inefficiencies. There is no doubt it should be feasible to do this in well under 1 second, with changes including one or more of: C code, compilation, and multicore processing. After all the 3 (or 4) different logical operations could be done independently, even though that's still a waste of compute cycles.
The most similar of the best challengers, logical3B2
, is about 80X faster than table
. It's about 10X faster than the naive logical operation. And it still has a lot of room for improvement.
Here is code to produce the above. NOTE I recommend commenting out some of the operations or vectors, unless you have a lot of RAM - the creation of x
, x1
, and xb
, along with the corresponding y
objects, will take up a fair bit of memory.
Also, note: I should have used 1L
as the integer multiplier for bigtabulate
, instead of just 1
. At some point I will re-run with this change, and would recommend that change to anyone who uses the bigtabulate
approach.
library(rbenchmark)
library(bigtabulate)
library(bit)
set.seed(0)
N <- 3E8
p <- 0.02
x <- sample(c(TRUE, FALSE), N, prob = c(p, 1-p), replace = TRUE)
y <- sample(c(TRUE, FALSE), N, prob = c(p, 1-p), replace = TRUE)
x1 <- 1*x
y1 <- 1*y
xb <- as.bit(x)
yb <- as.bit(y)
func_table <- function(v1,v2){
return(table(v1,v2))
}
func_logical <- function(v1,v2){
return(c(sum(v1 & v2), sum(v1 & !v2), sum(!v1 & v2), sum(!v1 & !v2)))
}
func_logicalB <- function(v1,v2){
v1B <- as.bit(v1)
v2B <- as.bit(v2)
return(c(sum(v1B & v2B), sum(v1B & !v2B), sum(!v1B & v2B), sum(!v1B & !v2B)))
}
func_bigtabulate <- function(v1,v2){
return(bigtabulate(1*cbind(v1,v2), ccols = c(1,2)))
}
func_bigtabulate2 <- function(v1,v2){
return(bigtabulate(cbind(v1,v2), ccols = c(1,2)))
}
func_logical3 <- function(v1,v2){
r1 <- sum(v1 & v2)
r2 <- sum(v1 & !v2)
r3 <- sum(!v1 & v2)
r4 <- length(v1) - sum(c(r1, r2, r3))
return(c(r1, r2, r3, r4))
}
func_logical3B <- function(v1,v2){
v1B <- as.bit(v1)
v2B <- as.bit(v2)
r1 <- sum(v1B & v2B)
r2 <- sum(v1B & !v2B)
r3 <- sum(!v1B & v2B)
r4 <- length(v1) - sum(c(r1, r2, r3))
return(c(r1, r2, r3, r4))
}
benchmark(replications = 1, order = "elapsed",
#table = {res <- func_table(x,y)},
logical = {res <- func_logical(x,y)},
logicalB1 = {res <- func_logical(xb,yb)},
logicalB2 = {res <- func_logicalB(x,y)},
logical3B1 = {res <- func_logical3(xb,yb)},
logical3B2 = {res <- func_logical3B(x,y)}
#bigtabulate = {res <- func_bigtabulate(x,y)},
#bigtabulate2 = {res <- func_bigtabulate2(x1,y1)}
)