Specific removing all duplicates with R [duplicate]

橙三吉。 提交于 2019-12-12 22:01:58

问题


For example I have two columns:

 Var1 Var2
 1     12
 1     65
 2     68
 2     98
 3     49
 3     24
 4      8
 5     67
 6     12

And I need to display only values which are unique for column Var1:

 Var1 Var2
 4      8
 5     67
 6     12

I can do you like this:

 mydata=mydata[!unique(mydata$Var1),]

But when I use the same formula for my large data set with about 1 million observations, nothing happens - the sample size is still the same. Could you please explain my why?

Thank you!


回答1:


With data.table (as it seem to be tagged with it) I would do

indx <- setDT(DT)[, .I[.N == 1], by = Var1]$V1 
DT[indx]
#    Var1 Var2
# 1:    4    8
# 2:    5   67
# 3:    6   12

Or... as @eddi reminded me, you can simply do

DT[, if(.N == 1) .SD, by = Var1]

Or (per the mentioned duplicates) with v >= 1.9.5 you could also do something like

setDT(DT, key = "Var1")[!(duplicated(DT) | duplicated(DT, fromLast = TRUE))]



回答2:


You can use this:

df <- data.frame(Var1=c(1,1,2,2,3,3,4,5,6), Var2=c(12,65,68,98,49,24,8,67,12) );
df[ave(1:nrow(df),df$Var1,FUN=length)==1,];
##   Var1 Var2
## 7    4    8
## 8    5   67
## 9    6   12

This will work even if the Var1 column is not ordered, because ave() does the necessary work to collect groups of equal elements (even if they are non-consecutive in the grouping vector) and map the result of the function call (length() in this case) back to each element that was a member of the group.


Regarding your code, it doesn't work because this is what unique() and its negation returns:

unique(df$Var1);
## [1] 1 2 3 4 5 6
!unique(df$Var1);
## [1] FALSE FALSE FALSE FALSE FALSE FALSE

As you can see, unique() returns the actual unique values from the argument vector. Negation returns true for zero and false for everything else.

Thus, you end up row-indexing using a short logical vector (it will be short if there were any duplicates removed by unique()) consisting of TRUE where there were zeroes, and FALSE otherwise.



来源:https://stackoverflow.com/questions/29955163/specific-removing-all-duplicates-with-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!