问题
I need some help to filter a dataframe.
The df has several columns and I want to split it into two dataframes:
1- One including only the rows in which the first column is a duplicate (including all of the replicas).
2- The rest of the rows, which are not duplicates.
Here is an example: This would be the original.
V1 V2
[1,] "A" "1"
[2,] "B" "1"
[3,] "A" "1"
[4,] "C" "2"
[5,] "D" "3"
[6,] "D" "4"
I want to turn into this:
V1 V2
[1,] "A" "1"
[2,] "A" "1"
[3,] "D" "3"
[4,] "D" "4"
And this:
V1 V2
[1,] "B" "1"
[2,] "C" "2"
Is there a way to do that? I have tried exporting to Excel, but the dataset was too large to make that viable.
Thank you
回答1:
Try
d[!duplicated(d),]
and
d[duplicated(d),]
where d
is your database.
=== UPDATE === If only the first column is desired, and all duplicates need to go in a separate column, you could do:
library(gdata)
d[duplicated2(d$V1,bothWays = T),]
d[!duplicated2(d$V1,bothWays = T),]
If only base R is desired, then:
bm <- duplicated(d$V1) | duplicated(d$V1,fromLast = TRUE)
d[bm,]
d[!bm,]
回答2:
You can use duplicated
but bear in mind that duplicated
only returns TRUE
at the first duplicated value, i.e.
> duplicated(c("A", "A", "A"))
[1] FALSE TRUE TRUE
does not return TRUE
for the first "A"
. If you want to catch all values of "A"
you can e.g. use
duplicated(c("A", "A", "A")) | duplicated(c("A", "A", "A"), fromLast = TRUE)
# [1] TRUE TRUE TRUE
You can then separate your data using
## Index of the duplicated values:
indDuplicatedVec <- duplicated(d[,1]) | duplicated(d[,1], fromLast = TRUE)
myDuplicates <- d[indDuplicatedVec, ]
myUniques <- d[!indDuplicatedVec, ]
> myDuplicates
#V1 V2
#1 A 1
#3 A 1
#5 D 3
#6 D 4
> myUniques
#V1 V2
#2 B 1
#4 C 2
回答3:
Considering df
as your input, you can use dplyr
and try:
df %>% group_by(V1) %>% filter(n() > 1)
for the duplicates
and
df %>% group_by(V1) %>% filter(n() == 1)
for the unique entries.
回答4:
We can use data.table
library(data.table)
setDT(df)[, .SD[.N >1], V1]
来源:https://stackoverflow.com/questions/43510160/filtering-a-dataframe-showing-only-duplicates