R: Standard evaluation for *_join (dplyr)

前端 未结 1 939
遥遥无期
遥遥无期 2020-12-31 07:08

How to join 2 tables using *_join() from dplyr, when the join variable names are different and stored in another variable?

e.g.

df1 = data_frame(x1 =         


        
相关标签:
1条回答
  • 2020-12-31 07:46

    The problem here is that you have to supply a named vector if the common column has different names in the data.frames. See what happens in your example:

    It works when you supply the names directly:

    df1 %>% left_join(df2, by = c("x1" = "x2"))
    #Source: local data frame [10 x 3]
    #
    #   x1 y1 y2
    #1   1 21 NA
    #2   2 22 NA
    #3   3 23 NA
    #4   4 24 NA
    #5   5 25 NA
    #6   6 26 26
    #7   7 27 27
    #8   8 28 28
    #9   9 29 29
    #10 10 30 30
    

    And the named vector you supply is:

    c("x1" = "x2")
    #  x1 
    #"x2" 
    

    Now if you use character vectors, the named vector changes to:

    var1 = "x1"
    var2 = "x2"
    
    c(var1 = var2)
    #var1             # <~~ this is why it doesn't work
    #"x2"
    

    I don't know if there's a "clean" way to solve this in dplyr currently. A workaround is the following tweak to construct the named vector as required:

    df1 %>% left_join(df2, by = setNames(var2, var1))
    #Source: local data frame [10 x 3]
    #
    #   x1 y1 y2
    #1   1 21 NA
    #2   2 22 NA
    #3   3 23 NA
    #4   4 24 NA
    #5   5 25 NA
    #6   6 26 26
    #7   7 27 27
    #8   8 28 28
    #9   9 29 29
    #10 10 30 30
    

    This works because:

    setNames(var2, var1)
    #  x1 
    #"x2" 
    

    Hope it helps.


    Note: you could do the same with names<- like so:

    df1 %>% left_join(df2, by = `names<-`(var2, var1))
    

    but Hadley recommends using the setNames approach instead.

    0 讨论(0)
提交回复
热议问题