Return df with a columns values that occur more than once [duplicate]

时光毁灭记忆、已成空白 提交于 2019-11-26 17:16:03

问题


This question already has an answer here:

  • Subset data frame based on number of rows per group 3 answers

I have a data frame df, and I am trying to subset all rows that have a value in column B occur more than once in the dataset.

I tried using table to do it, but am having trouble subsetting from the table:

t<-table(df$B)

Then I try subsetting it using:

subset(df, table(df$B)>1)

And I get the error

"Error in x[subset & !is.na(subset)] : object of type 'closure' is not subsettable"

How can I subset my data frame using table counts?


回答1:


Here is a dplyr solution (using mrFlick's data.frame)

library(dplyr)
newd <-  dd %>% group_by(b) %>% filter(n()>1) #
newd
#    a b 
# 1  1 1 
# 2  2 1 
# 3  5 4 
# 4  6 4 
# 5  7 4 
# 6  9 6 
# 7 10 6 

Or, using data.table

setDT(dd)[,if(.N >1) .SD,by=b]

Or using base R

dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]



回答2:


May I suggest an alternative, faster way to do this with data.table?

require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B

(or) you can couple .I (another special variable - see ?data.table) which gives the corresponding row number in df, along with .N as follows:

setDT(df)[df[, .I[.N > 1L], by=B]$V1]

(or) have a look at @mnel's another for another variation (using yet another special variable .SD).




回答3:


Using table() isn't the best because then you have to rejoin it to the original rows of the data.frame. The ave function makes it easier to calculate row-level values for different groups. For example

dd<-data.frame(
    a=1:10,
    b=c(1,1,2,3,4,4,4,5,6, 6)
)


dd[with(dd, ave(b,b,FUN=length))>1, ]
#subset(dd, ave(b,b,FUN=length)>1)    #same thing

    a b
1   1 1
2   2 1
5   5 4
6   6 4
7   7 4
9   9 6
10 10 6

Here, for each level of b, it counts the length of b, which is really just the number of b's and returns that back to the appropriate row for each value. Then we use that to subset.



来源:https://stackoverflow.com/questions/24503279/return-df-with-a-columns-values-that-occur-more-than-once

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!