efficient conditional cross join in data table

笑着哭i 提交于 2021-01-28 14:11:42

问题


EDITED (Sorry, I modified the code, it had an error that prevented reproduction.)

I am trying to efficiently merge with a condition.

The way I am doing it now is to cross-join (which I want to preserve) except I have one condition for a subset of the columns.

Cross join function (from here)

CJ.table.1 <- function(X,Y)
      setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]

set.seed(1)
#generate data
x = data.table(t=rep(1:10,2), z=sample(1:10,20,replace=T))
x2 = data.table(tprime=rep(1:10,2), zprime=sample(1:10,20,replace=T))

joined = CJ.table.1(x,x2)


> joined
      t  z tprime zprime
  1:  1  3      1     10
  2:  2  4      1     10
  3:  3  6      1     10
  4:  4 10      1     10
  5:  5  3      1     10
 ---                    
396:  6  5     10      5
397:  7  8     10      5
398:  8 10     10      5
399:  9  4     10      5
400: 10  8     10      5

Then I want to make sure t is increasing by 1 only.

setcolorder(joined, c("t", "tprime", "z",'zprime'))
joined=joined[tprime==t+1]

The final desired output is then:

> joined
    t tprime  z zprime
 1: 1      2  3      3
 2: 1      2  3      3
 3: 2      3  4      7
 4: 2      3  2      7
 5: 3      4  6      2
 6: 3      4  7      2
 7: 4      5 10      3
 8: 4      5  4      3
 9: 5      6  3      4
10: 5      6  8      4
11: 6      7  9      1
12: 6      7  5      1
13: 7      8 10      4
14: 7      8  8      4
15: 8      9  7      9
16: 8      9 10      9
17: 9     10  7      4
18: 9     10  4      4
19: 1      2  3      6
20: 1      2  3      6
21: 2      3  4      5
22: 2      3  2      5
23: 3      4  6      2
24: 3      4  7      2
25: 4      5 10      9
26: 4      5  4      9
27: 5      6  3      7
28: 5      6  8      7
29: 6      7  9      8
30: 6      7  5      8
31: 7      8 10      2
32: 7      8  8      2
33: 8      9  7      8
34: 8      9 10      8
35: 9     10  7      5
36: 9     10  4      5
    t tprime  z zprime

The reason I want to condition BEFORE the cross join is that the actual data I have is huge and therefore, it is inefficient to generate the entire thing first and THEN prune it down.

The reason I can't just do a merge is that I need to cross join the other rows as well.

来源:https://stackoverflow.com/questions/55111630/efficient-conditional-cross-join-in-data-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!