subset based on frequency level [duplicate]

This question already has an answer here:

Subset data frame based on number of rows per group 2 answers

I want to generate a df that selects rows associated with an "ID" that in turn is associated with a variable called cutoff. For this example, I set the cutoff to 9, meaning that I want to select rows in df1 whose ID value is associated with more than 9 rows. The last line of my code generates a df that I don't understand. The correct df would have 24 rows, all with either a 3 or a 4 in the ID column. Can someone explain what my last line of code is actually doing and suggest a different approach?

set.seed(123)
ID<-rep(c(1,2,3,4,5),times=c(5,7,9,11,13))
sub1<-rnorm(45)
sub2<-rnorm(45)
df1<-data.frame(ID,sub1,sub2)
IDfreq<-count(df1,"ID")
cutoff<-9
df2<-subset(df1,subset=(IDfreq$freq>cutoff))

df1[ df1$ID %in%  names(table(df1$ID))[table(df1$ID) >9] , ]

This will test to see if the df1$ID value is in a category with more than 9 values. If it is, then the logical element for the returned vector will be TRUE and in turn that as the "i" argument will cause the [-function to return the entire row since the "j" item is empty.

See:

?`[`
?'%in%'

Maybe closer to what you had in mind is to create a vector of frequencies using ave:

subset(df1, ave(ID, ID, FUN = length) > cutoff)

Using dplyr

library(dplyr)
 df1 %>% 
 group_by(ID) %>% 
 filter(n()>cutoff)

来源：https://stackoverflow.com/questions/24835233/subset-based-on-frequency-level

标签

subset

frequency

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!