问题
I have a dataframe where I would like to remove all rows with duplicates. For instance my dataframe looks like:
> df <- data.frame(A = c("Happy", "Happy", "Sad", "Confused", "Mad", "Mad"), B = c(1, 2, 3, 4, 5, 6))
> df
A B
1 Happy 1
2 Happy 2
3 Sad 3
4 Confused 4
5 Mad 5
6 Mad 6
I only want rows where the entries in A are unique to get:
A B
1 Sad 3
2 Confused 4
回答1:
You can try duplicated
df[!(duplicated(df$A)|duplicated(df$A,fromLast=TRUE)),]
# A B
#3 Sad 3
#4 Confused 4
or
df[df$A %in% with(as.data.frame(table(df$A)), Var1[Freq==1]),]
# A B
#3 Sad 3
#4 Confused 4
or
df[colSums(sapply(df$A, `==`, df$A))==1,]
# A B
#3 Sad 3
#4 Confused 4
or
df[colSums(Vectorize(function(x) x==df$A)(df$A))==1,]
or using data.table
(similar to @beginneR's use of ave
)
library(data.table)
setDT(df)[,.SD[.N==1], by=A]
# A B
#1: Sad 3
#2: Confused 4
or
setDT(df)[df[,.I[.N==1], by=A]$V1]
# A B
#1: Sad 3
#2: Confused 4
回答2:
akrun seems to be collecting different methods, so here's another one in base:
df[ave(as.numeric(df$A), df$A, FUN = length) == 1,]
# A B
#3 Sad 3
#4 Confused 4
(I guess the one with duplicated
would be the most commonly used method)
Or using dplyr:
require(dplyr)
group_by(df, A) %>% filter(n() == 1)
来源:https://stackoverflow.com/questions/27067556/r-remove-duplicate-rows