问题
This question already has an answer here:
- Subset data frame based on number of rows per group 3 answers
I have a data frame df, and I am trying to subset all rows that have a value in column B
occur more than once in the dataset.
I tried using table to do it, but am having trouble subsetting from the table:
t<-table(df$B)
Then I try subsetting it using:
subset(df, table(df$B)>1)
And I get the error
"Error in x[subset & !is.na(subset)] : object of type 'closure' is not subsettable"
How can I subset my data frame using table counts?
回答1:
Here is a dplyr
solution (using mrFlick's data.frame)
library(dplyr)
newd <- dd %>% group_by(b) %>% filter(n()>1) #
newd
# a b
# 1 1 1
# 2 2 1
# 3 5 4
# 4 6 4
# 5 7 4
# 6 9 6
# 7 10 6
Or, using data.table
setDT(dd)[,if(.N >1) .SD,by=b]
Or using base R
dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]
回答2:
May I suggest an alternative, faster way to do this with data.table
?
require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B
(or) you can couple .I
(another special variable - see ?data.table
) which gives the corresponding row number in df
, along with .N
as follows:
setDT(df)[df[, .I[.N > 1L], by=B]$V1]
(or) have a look at @mnel's another for another variation (using yet another special variable .SD
).
回答3:
Using table()
isn't the best because then you have to rejoin it to the original rows of the data.frame. The ave
function makes it easier to calculate row-level values for different groups. For example
dd<-data.frame(
a=1:10,
b=c(1,1,2,3,4,4,4,5,6, 6)
)
dd[with(dd, ave(b,b,FUN=length))>1, ]
#subset(dd, ave(b,b,FUN=length)>1) #same thing
a b
1 1 1
2 2 1
5 5 4
6 6 4
7 7 4
9 9 6
10 10 6
Here, for each level of b
, it counts the length of b
, which is really just the number of b
's and returns that back to the appropriate row for each value. Then we use that to subset.
来源:https://stackoverflow.com/questions/24503279/return-df-with-a-columns-values-that-occur-more-than-once