Minus operation of data frames

前端 未结 7 727
忘掉有多难
忘掉有多难 2020-12-05 18:53

I have 2 data frames df1 and df2.

df1 <- data.frame(c1=c(\"a\",\"b\",\"c\",\"d\"),c2=c(1,2,3,4) )
df2 <- data.frame(c1=c(\"         


        
7条回答
  •  时光取名叫无心
    2020-12-05 19:05

    One issue with https://stackoverflow.com/a/16144262/2055486 is it assumes neither data frame already has duplicated rows. The following function removes that limitation and also works with arbitrary user defined columns in x or y.

    The implementation uses a similar idea to the implementation of duplicated.data.frame in concatenating the columns together with a separator. duplicated.data.frame uses "\r", which can cause collisions if the entries have embedded "\r" characters. This uses the ASCII record separator "\30" which will have a much lower chance of appearing in input data.

    setdiff.data.frame <- function(x, y,
        by = intersect(names(x), names(y)),
        by.x = by, by.y = by) {
      stopifnot(
        is.data.frame(x),
        is.data.frame(y),
        length(by.x) == length(by.y))
    
      !do.call(paste, c(x[by.x], sep = "\30")) %in% do.call(paste, c(y[by.y], sep = "\30"))
    }
    
    # Example usage
    # remove all 4 or 6 cylinder 4 gear cars or 8 cylinder 3 gear rows
    to_remove <- data.frame(cyl = c(4, 6, 8), gear = c(4, 4, 3))
    mtcars[setdiff.data.frame(mtcars, to_remove), ]
    #>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    #> Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    #> Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    #> Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    #> Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    #> Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    #> Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    #> Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    #> Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    
    # with differing column names
    to_remove2 <- data.frame(a = c(4, 6, 8), b = c(4, 4, 3))
    mtcars[setdiff.data.frame(mtcars, to_remove2, by.x = c("cyl", "gear"), by.y = c("a", "b")), ]
    #>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    #> Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    #> Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    #> Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    #> Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    #> Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    #> Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    #> Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    #> Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    

提交回复
热议问题