Find all sequences with the same column value

后端 未结 9 899
误落风尘
误落风尘 2020-12-17 18:35

I have the following data frame:

╔══════╦═════════╗
║ Code ║ Airline ║
╠══════╬═════════╣
║    1 ║ AF      ║
║    1 ║ KL      ║
║    8 ║ AR      ║
║    8 ║ A         


        
9条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-17 19:06

    There is likely a more efficient route, but this should fly:

    # example data
    d <- data.frame(code = c(1,1,8,8,8),
         airline = c("AF","KL","AR","AZ","DL"),
         stringsAsFactors = FALSE)
    
    # merge d to itself on the code column.  This isn't necessarily efficient
    d2 <- merge(d, d, by = "code")
    
    # prune d2 to remove occasions where
    # airline.x and airline.y (from the merge) are equal
    d2 <- d2[d2[["airline.x"]] != d2[["airline.y"]], ]
    # construct the combinations for each airline using a split, apply, combine
    # then, use stack to get a nice structure for merging
    d2 <- stack(
          lapply(split(d2, d2[["airline.x"]]),
            function(ii) paste0(ii$airline.y, collapse = ",")))
    
    # merge d and d2.  "ind" is a column produced by stack
    merge(d, d2, by.x = "airline", by.y = "ind")
    #  airline code values
    #1      AF    1     KL
    #2      AR    8  AZ,DL
    #3      AZ    8  AR,DL
    #4      DL    8  AR,AZ
    #5      KL    1     AF
    

提交回复
热议问题