With a 2-column data.table, I\'d like to summarize the pairwise relationships in column 1 by summing the number of shared elements in column 2. In other words,
How about this one using foverlaps(). The more consecutive values of Y you've for each X, the lesser number of rows this'll produce compared to a cartesian join.
d = data.table(X=c(1,1,1,2,2,2,2,3,3,3,4,4), Y=c(1,2,3,1,2,3,4,1,5,6,4,5))
setorder(d, X)
d[, id := cumsum(c(0L, diff(Y)) != 1L), by=X]
dd = d[, .(start=Y[1L], end=Y[.N]), by=.(X,id)][, id := NULL][]
ans <- foverlaps(dd, setkey(dd, start, end))
ans[, count := pmin(abs(i.end-start+1L), abs(end-i.start+1L),
abs(i.end-i.start+1L), abs(end-start+1L))]
ans[, .(count = sum(count)), by=.(X, i.X)][order(i.X, X)]
# X i.X count
# 1: 1 1 3
# 2: 2 1 3
# 3: 3 1 1
# 4: 1 2 3
# 5: 2 2 4
# 6: 3 2 1
# 7: 4 2 1
# 8: 1 3 1
# 9: 2 3 1
# 10: 3 3 3
# 11: 4 3 1
# 12: 2 4 1
# 13: 3 4 1
# 14: 4 4 2
Note: make sure
XandYare integers for faster results. This is because joins on integer types are faster than on double types (foverlapsperforms binary joins internally).
You can make this more memory efficient by using which=TRUE in foverlaps() and using the indices to generate count in the next step.