Return df with a columns values that occur more than once [duplicate]

那年仲夏 提交于 2019-11-27 15:36:18

Here is a dplyr solution (using mrFlick's data.frame)

library(dplyr)
newd <-  dd %>% group_by(b) %>% filter(n()>1) #
newd
#    a b 
# 1  1 1 
# 2  2 1 
# 3  5 4 
# 4  6 4 
# 5  7 4 
# 6  9 6 
# 7 10 6 

Or, using data.table

setDT(dd)[,if(.N >1) .SD,by=b]

Or using base R

dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]
Mike.Gahan

May I suggest an alternative, faster way to do this with data.table?

require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B

(or) you can couple .I (another special variable - see ?data.table) which gives the corresponding row number in df, along with .N as follows:

setDT(df)[df[, .I[.N > 1L], by=B]$V1]

(or) have a look at @mnel's another for another variation (using yet another special variable .SD).

Using table() isn't the best because then you have to rejoin it to the original rows of the data.frame. The ave function makes it easier to calculate row-level values for different groups. For example

dd<-data.frame(
    a=1:10,
    b=c(1,1,2,3,4,4,4,5,6, 6)
)


dd[with(dd, ave(b,b,FUN=length))>1, ]
#subset(dd, ave(b,b,FUN=length)>1)    #same thing

    a b
1   1 1
2   2 1
5   5 4
6   6 4
7   7 4
9   9 6
10 10 6

Here, for each level of b, it counts the length of b, which is really just the number of b's and returns that back to the appropriate row for each value. Then we use that to subset.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!