Is there a way to create Stata's _merge indicator variable with R's merge()?

前端 未结 3 1521
再見小時候
再見小時候 2021-01-18 21:18

Stata automatically creates a variable called \"_merge\" indicating the matched variables in both datasets after merge. Is there a way to get such variable generated by R\'s

3条回答
  •  感动是毒
    2021-01-18 21:49

    I have written the following function based on @Metrics answer. It creates a variable "merge" in the resulting dataset that indicates observations as Stata does.

    stata.merge <- function(x,y, by = intersect(names(x), names(y))){
    
    x[is.na(x)] <- Inf
    y[is.na(y)] <- Inf
    
    matched <- merge(x, y, by.x = by, by.y = by, all = TRUE)
    matched <- matched[complete.cases(matched),]
    matched$merge <- "matched"
    master <- merge(x, y, by.x = by, by.y = by, all.x = TRUE)
    master <- master[!complete.cases(master),]
    master$merge <- "master"
    using <- merge(x, y, by.x = by, by.y = by, all.y = TRUE)
    using <- using[!complete.cases(using),]
    using$merge <- "using"
    
    df <- rbind(matched, master,using)
    df[sapply(df, is.infinite)] <- NA
    df
    }
    

    Test.

    df1 <- data.frame(id = letters[c(1:5,8:9)], v1=c(1:5,8:9))
    df1
    
       id v1
    1  a  1
    2  b  2
    3  c  3
    4  d  4
    5  e  5
    6  h  8
    7  i  9
    
    df2 <- data.frame(id = letters[1:8], v1=c(1:7,NA))
    df2
    
      id v1
    1  a  1
    2  b  2
    3  c  3
    4  d  4
    5  e  5
    6  f  6
    7  g  7
    8  h NA
    
    stata.merge(df1,df2, by = "id")
    
       id v1.x v1.y   merge
    1   a    1    1 matched
    2   b    2    2 matched
    3   c    3    3 matched
    4   d    4    4 matched
    5   e    5    5 matched
    6   h    8   NA matched
    7   i    9   NA  master
    71  f   NA    6   using
    8   g   NA    7   using
    

提交回复
热议问题