Perform a semi-join with data.table

后端 未结 8 1347
天命终不由人
天命终不由人 2020-11-27 16:44

How do I perform a semi-join with data.table? A semi-join is like an inner join except that it only returns the columns of X (not also those of Y), and does not repeat the r

相关标签:
8条回答
  • 2020-11-27 17:42

    More possibilities :

    w = unique(x[y,which=TRUE])  # the row numbers in x which have a match from y
    x[w]
    

    If there are duplicate key values in x, then that needs :

    w = unique(x[y,which=TRUE,allow.cartesian=TRUE])
    x[w]
    

    Or, the other way around :

    setkey(y,x)
    w = !is.na(y[x,which=TRUE,mult="first"])
    x[w]
    

    If nrow(x) << nrow(y) then the y[x] approach should be faster.
    If nrow(x) >> nrow(y) then the x[y] approach should be faster.

    But the anti anti join appeals too :-)

    0 讨论(0)
  • 2020-11-27 17:45

    Try the following:

     w <- y[,unique(x)]
     x[x %in% w]
    

    Output will be:

       x y
    1: 1 a
    
    0 讨论(0)
提交回复
热议问题