For example (not sure if most representative example though):
N <- 1e6 d1 <- data.frame(x=sample(N,N), y1=rnorm(N)) d2 <- data.frame(x=sample(N,N),
For simple task (unique values on both sides of join) I use match:
match
system.time({ d <- d1 d$y2 <- d2$y2[match(d1$x,d2$x)] })
It's far more faster than merge (on my machine 0.13s to 3.37s).
My timings:
merge
plyr