Suppose I have two matrices, each with two columns and differing numbers of row. I want to check and see which pairs of one matrix are in the other matrix. If these were one
Coming in late to the game: I had previously written an algorithm using the "paste with delimiter" method, and then found this page. I was guessing that one of the code snippets here would be the fastest, but:
andrie<-function(mfoo,nfoo) apply(mfoo, 1, `%inm%`, nfoo)
# using Andrie's %inm% operator exactly as above
carl<-function(mfoo,nfoo) {
allrows<-unlist(sapply(1:nrow(mfoo),function(j) paste(mfoo[j,],collapse='_')))
allfoo <- unlist(sapply(1:nrow(nfoo),function(j) paste(nfoo[j,],collapse='_')))
thewalls<-setdiff(allrows,allfoo)
dowalls<-mfoo[allrows%in%thewalls,]
}
ramnath <- function (a,x) apply(a, 1, digest) %in% apply(x, 1, digest)
mfoo<-matrix( sample(1:100,400,rep=TRUE),nr=100)
nfoo<-mfoo[sample(1:100,60),]
library(microbenchmark)
microbenchmark(andrie(mfoo,nfoo),carl(mfoo,nfoo),ramnath(mfoo,nfoo),times=5)
Unit: milliseconds
expr min lq median uq max neval
andrie(mfoo, nfoo) 25.564196 26.527632 27.964448 29.687344 102.802004 5
carl(mfoo, nfoo) 1.020310 1.079323 1.096855 1.193926 1.246523 5
ramnath(mfoo, nfoo) 8.176164 8.429318 8.539644 9.258480 9.458608 5
So apparently constructing character strings and doing a single set operation is fastest! (PS I checked and all 3 algorithms give the same result)