How to select unique point

问题

I am a novice R programmer. I have a following series of points.

df <- data.frame(x = c(1 , 2, 3, 4), y = c(6 , 3, 7, 5))
df <- df %>% mutate(k = 1) 
df <- df %>% full_join(df, by = 'k')
df <- subset(df, select = c('x.x', 'y.x', 'x.y', 'y.y'))
df

Is there way to select for "unique" points? (the order of the points do not matter)

EDIT:

x.x y.x x.y y.y
1   6   2   3
2   3   3   7
.
.
.

(I changed the 2 to 7 to clarify the problem)

回答1:

With data.table (and working from the OP's initial df):

library(data.table)
setDT(df)

df[, r := .I ]
df[df, on=.(r > r), nomatch=0]


   x y r i.x i.y
1: 2 3 1   1   6
2: 3 2 1   1   6
3: 4 5 1   1   6
4: 3 2 2   2   3
5: 4 5 2   2   3
6: 4 5 3   3   2

This is a "non-equi join" on row numbers. In x[i, on=.(r > r)] the left-hand r refers to the row in x and the right-hand one to a row of i. The columns named like i.* are taken from i.

Data.table joins, which are of the form x[i], use i to look up rows of x. The nomatch=0 option drops rows of i that find no matches.

回答2:

In the tidyverse, you can save a bit of work by doing the self-join with tidyr::crossing. If you add row indices pre-join, reducing is a simple filter call:

library(tidyverse)

df %>% mutate(i = row_number()) %>%    # add row index column
    crossing(., .) %>%    # Cartesian self-join
    filter(i < i1) %>%    # reduce to lower indices
    select(-i, -i1)    # remove extraneous columns

##   x y x1 y1
## 1 1 6  2  3
## 2 1 6  3  7
## 3 1 6  4  5
## 4 2 3  3  7
## 5 2 3  4  5
## 6 3 7  4  5

or in all base R,

df$m <- 1
df$i <- seq(nrow(df))
df <- merge(df, df, by = 'm')
df[df$i.x < df$i.y, c(-1, -4, -7)]

##    x.x y.x x.y y.y
## 2    1   6   2   3
## 3    1   6   3   7
## 4    1   6   4   5
## 7    2   3   3   7
## 8    2   3   4   5
## 12   3   7   4   5

回答3:

You can use the duplicated.matrix() function from base, to find the rows which are no duplicator - which means in fact that there are unique. When you call the duplicated() function you have to clarify that you only want to use the to first colons. With this call you check which line is unique. In a second step you call in your dataframe for this rows, with all columns.

unique_lines = !duplicated.matrix(df[,c(1,2)])
df[unique_lines,]

来源：https://stackoverflow.com/questions/43315185/how-to-select-unique-point

标签

dplyr

combinations