“Set Difference” between two vectors with duplicate values

前端 未结 3 1626
孤城傲影
孤城傲影 2020-12-06 20:37

I have 3 vectors

x <- c(1,3,5,7,3,8)
y <- c(3,5,7)
z <- c(3,3,8)

I want to find the elements of x that are not in

3条回答
  •  天命终不由人
    2020-12-06 21:12

    There should be some better ways to do this but here is one option

    get_diff_vectors <- function(x, y) {
      count_x <- table(x)
      count_y <- table(y)
      same_counts <- match(names(count_y), names(count_x))
      count_x[same_counts] <- count_x[same_counts] - count_y
      as.numeric(rep(names(count_x), count_x))
    }
    
    get_diff_vectors(x, y)
    #[1] 1 3 8
    get_diff_vectors(x, z)
    #[1] 1 5 7
    get_diff_vectors(x, c(5, 7))
    #[1] 1 3 3 8
    

    We count the frequency of x and y using table, match the numbers which occur in both and subtract the counts y from x. Finally recreate the remaining vector using rep.


    Still not able to find a better way but here is dplyr way using the somewhat similar logic.

    library(dplyr)
    
    get_diff_vectors_dplyr <- function(x, y) {
      df1 <- data.frame(x) %>% count(x)
      df2 <- data.frame(y) %>% count(y)
      final <- left_join(df1, df2, by = c("x" = "y")) %>%
               mutate_at(c("n.x", "n.y"), funs(replace(., is.na(.), 0))) %>%
               mutate(n = n.x - n.y)
    
      rep(final$x, final$n)
    }
    
    get_diff_vectors_dplyr(x, y)
    #[1] 1 3 8
    get_diff_vectors_dplyr(x, z)
    #[1] 1 5 7
    get_diff_vectors_dplyr(x, c(5, 7))
    #[1] 1 3 3 8
    

    The vecsets package mentioned by OP has function vsetdiff which does this very easily

    vecsets::vsetdiff(x, y)
    #[1] 1 3 8
    vecsets::vsetdiff(x, z)
    #[1] 1 5 7
    vecsets::vsetdiff(x, c(5, 7))
    #[1] 1 3 3 8
    

提交回复
热议问题