Implementation of skyline query or efficient frontier

前端未结

关注

 6  1784

慢半拍i 2020-12-06 02:44

I know there must be an easy answer to this but somehow I can\'t seem to find it...

I have a data frame with 2 numeric columns. I would like to remove from it, the r

6条回答

隐瞒了意图╮ (楼主)

2020-12-06 02:47

Edit (2015-03-02): For a more efficient solution, please see Patrick Roocks' rPref, a package for "Database Preferences and Skyline Computation", (also linked to in his answer below). To show that it finds the same solution as my code here, I've appended an example using it to my original answer here.

Riffing off of Vincent Zoonekynd's enlightening response, here's an algorithm that's fully vectorized, and likely more efficient:

set.seed(100)
d <- data.frame(x = rnorm(100), y = rnorm(100))

D   <- d[order(d$x, d$y, decreasing=TRUE), ]
res <- D[which(!duplicated(cummax(D$y))), ]
#             x         y
# 64  2.5819589 0.7946803
# 20  2.3102968 1.6151907
# 95 -0.5302965 1.8952759
# 80 -2.0744048 2.1686003


# And then, if you would prefer the rows to be in 
# their original order, just do:
d[sort(as.numeric(rownames(res))), ]
#            x         y
# 20  2.3102968 1.6151907
# 64  2.5819589 0.7946803
# 80 -2.0744048 2.1686003
# 95 -0.5302965 1.8952759

Or, using the rPref package:

library(rPref)
psel(d, high(x) | high(y))
#             x         y
# 20  2.3102968 1.6151907
# 64  2.5819589 0.7946803
# 80 -2.0744048 2.1686003
# 95 -0.5302965 1.8952759

0 讨论(0)

查看其它6个回答