Implementation of skyline query or efficient frontier

前端 未结 6 1784
慢半拍i
慢半拍i 2020-12-06 02:44

I know there must be an easy answer to this but somehow I can\'t seem to find it...

I have a data frame with 2 numeric columns. I would like to remove from it, the r

6条回答
  •  隐瞒了意图╮
    2020-12-06 02:47

    Edit (2015-03-02): For a more efficient solution, please see Patrick Roocks' rPref, a package for "Database Preferences and Skyline Computation", (also linked to in his answer below). To show that it finds the same solution as my code here, I've appended an example using it to my original answer here.


    Riffing off of Vincent Zoonekynd's enlightening response, here's an algorithm that's fully vectorized, and likely more efficient:

    set.seed(100)
    d <- data.frame(x = rnorm(100), y = rnorm(100))
    
    D   <- d[order(d$x, d$y, decreasing=TRUE), ]
    res <- D[which(!duplicated(cummax(D$y))), ]
    #             x         y
    # 64  2.5819589 0.7946803
    # 20  2.3102968 1.6151907
    # 95 -0.5302965 1.8952759
    # 80 -2.0744048 2.1686003
    
    
    # And then, if you would prefer the rows to be in 
    # their original order, just do:
    d[sort(as.numeric(rownames(res))), ]
    #            x         y
    # 20  2.3102968 1.6151907
    # 64  2.5819589 0.7946803
    # 80 -2.0744048 2.1686003
    # 95 -0.5302965 1.8952759
    

    Or, using the rPref package:

    library(rPref)
    psel(d, high(x) | high(y))
    #             x         y
    # 20  2.3102968 1.6151907
    # 64  2.5819589 0.7946803
    # 80 -2.0744048 2.1686003
    # 95 -0.5302965 1.8952759
    

提交回复
热议问题