Is there a fast way to iterate through combinations like those returned by expand.grid or CJ (data.table). These get too big to fit i
I think you'll get better performance if you give each of the workers a chunk of one of the data frames, have them each perform the computations, and then combine the results. This results in more efficient computation and reduced memory usage by the workers.
Here is an example that uses the isplitRow function from the itertools package:
library(doParallel)
library(itertools)
dim1 <- 10
dim2 <- 100
df1 <- data.frame(a = 1:dim1, b = 1:dim1)
df2 <- data.frame(x= 1:dim2, y = 1:dim2, z = 1:dim2)
f <- function(...) sum(...)
nw <- 4
cl <- makeCluster(nw)
registerDoParallel(cl)
res <- foreach(d2=isplitRows(df2, chunks=nw), .combine=c) %dopar% {
expgrid <- expand.grid(x=seq(dim1), y=seq(nrow(d2)))
apply(expgrid, 1, function(i) f(df1[i[["x"]],], d2[i[["y"]],]))
}
I split df2 because it has more rows, but you could choose either.