How to select unique point

怎甘沉沦 提交于 2020-01-04 03:51:11

问题


I am a novice R programmer. I have a following series of points.

df <- data.frame(x = c(1 , 2, 3, 4), y = c(6 , 3, 7, 5))
df <- df %>% mutate(k = 1) 
df <- df %>% full_join(df, by = 'k')
df <- subset(df, select = c('x.x', 'y.x', 'x.y', 'y.y'))
df

Is there way to select for "unique" points? (the order of the points do not matter)

EDIT:

x.x y.x x.y y.y
1   6   2   3
2   3   3   7
.
.
.

(I changed the 2 to 7 to clarify the problem)


回答1:


With data.table (and working from the OP's initial df):

library(data.table)
setDT(df)

df[, r := .I ]
df[df, on=.(r > r), nomatch=0]


   x y r i.x i.y
1: 2 3 1   1   6
2: 3 2 1   1   6
3: 4 5 1   1   6
4: 3 2 2   2   3
5: 4 5 2   2   3
6: 4 5 3   3   2

This is a "non-equi join" on row numbers. In x[i, on=.(r > r)] the left-hand r refers to the row in x and the right-hand one to a row of i. The columns named like i.* are taken from i.

Data.table joins, which are of the form x[i], use i to look up rows of x. The nomatch=0 option drops rows of i that find no matches.




回答2:


In the tidyverse, you can save a bit of work by doing the self-join with tidyr::crossing. If you add row indices pre-join, reducing is a simple filter call:

library(tidyverse)

df %>% mutate(i = row_number()) %>%    # add row index column
    crossing(., .) %>%    # Cartesian self-join
    filter(i < i1) %>%    # reduce to lower indices
    select(-i, -i1)    # remove extraneous columns

##   x y x1 y1
## 1 1 6  2  3
## 2 1 6  3  7
## 3 1 6  4  5
## 4 2 3  3  7
## 5 2 3  4  5
## 6 3 7  4  5

or in all base R,

df$m <- 1
df$i <- seq(nrow(df))
df <- merge(df, df, by = 'm')
df[df$i.x < df$i.y, c(-1, -4, -7)]

##    x.x y.x x.y y.y
## 2    1   6   2   3
## 3    1   6   3   7
## 4    1   6   4   5
## 7    2   3   3   7
## 8    2   3   4   5
## 12   3   7   4   5



回答3:


You can use the duplicated.matrix() function from base, to find the rows which are no duplicator - which means in fact that there are unique. When you call the duplicated() function you have to clarify that you only want to use the to first colons. With this call you check which line is unique. In a second step you call in your dataframe for this rows, with all columns.

unique_lines = !duplicated.matrix(df[,c(1,2)])
df[unique_lines,]


来源:https://stackoverflow.com/questions/43315185/how-to-select-unique-point

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!