问题
I am a novice R programmer. I have a following series of points.
df <- data.frame(x = c(1 , 2, 3, 4), y = c(6 , 3, 7, 5))
df <- df %>% mutate(k = 1)
df <- df %>% full_join(df, by = 'k')
df <- subset(df, select = c('x.x', 'y.x', 'x.y', 'y.y'))
df
Is there way to select for "unique" points? (the order of the points do not matter)
EDIT:
x.x y.x x.y y.y
1 6 2 3
2 3 3 7
.
.
.
(I changed the 2 to 7 to clarify the problem)
回答1:
With data.table (and working from the OP's initial df
):
library(data.table)
setDT(df)
df[, r := .I ]
df[df, on=.(r > r), nomatch=0]
x y r i.x i.y
1: 2 3 1 1 6
2: 3 2 1 1 6
3: 4 5 1 1 6
4: 3 2 2 2 3
5: 4 5 2 2 3
6: 4 5 3 3 2
This is a "non-equi join" on row numbers. In x[i, on=.(r > r)]
the left-hand r
refers to the row in x
and the right-hand one to a row of i
. The columns named like i.*
are taken from i
.
Data.table joins, which are of the form x[i]
, use i
to look up rows of x
. The nomatch=0
option drops rows of i
that find no matches.
回答2:
In the tidyverse, you can save a bit of work by doing the self-join with tidyr::crossing
. If you add row indices pre-join, reducing is a simple filter
call:
library(tidyverse)
df %>% mutate(i = row_number()) %>% # add row index column
crossing(., .) %>% # Cartesian self-join
filter(i < i1) %>% # reduce to lower indices
select(-i, -i1) # remove extraneous columns
## x y x1 y1
## 1 1 6 2 3
## 2 1 6 3 7
## 3 1 6 4 5
## 4 2 3 3 7
## 5 2 3 4 5
## 6 3 7 4 5
or in all base R,
df$m <- 1
df$i <- seq(nrow(df))
df <- merge(df, df, by = 'm')
df[df$i.x < df$i.y, c(-1, -4, -7)]
## x.x y.x x.y y.y
## 2 1 6 2 3
## 3 1 6 3 7
## 4 1 6 4 5
## 7 2 3 3 7
## 8 2 3 4 5
## 12 3 7 4 5
回答3:
You can use the duplicated.matrix()
function from base, to find the rows which are no duplicator - which means in fact that there are unique. When you call the duplicated()
function you have to clarify that you only want to use the to first colons. With this call you check which line is unique. In a second step you call in your dataframe for this rows, with all columns.
unique_lines = !duplicated.matrix(df[,c(1,2)])
df[unique_lines,]
来源:https://stackoverflow.com/questions/43315185/how-to-select-unique-point