remove IDs that occur x times R

删除回忆录丶 提交于 2019-12-17 05:13:31

问题


I have a df and I would like to remove people who have less than X amount of rows in df. E.g., in this toy example, I would like to retain people who have >= 5 rows.

df
   names  fruit
4   john   kiwi
7   john  apple
9   john banana
13  john orange
14  john  apple
2   mary orange
5   mary  apple
8   mary orange
10  mary  apple
12  mary  apple
1    tom  apple
3    tom banana
6    tom  apple
11   tom   kiwi

example output

df
   names  fruit
4   john   kiwi
7   john  apple
9   john banana
13  john orange
14  john  apple
2   mary orange
5   mary  apple
8   mary orange
10  mary  apple
12  mary  apple

Thanks in advance!


回答1:


You can use table like this:

df[df$names %in% names(table(df$names))[table(df$names) >= 5],]



回答2:


Here's a data.table solution using the in-built .N value, which is as described in the ?data.table help file: ‘.N’ is an integer, length 1, containing the number of rows in the group.

#create a similar reproducible exmaple
library(data.table)
dat <- data.table(names=rep(letters[1:3],c(5,5,3)),var=1:13)

Remove the rows:

dat[, cnt:=.N, by=names][cnt >= 5]

Though I feel like there must be a way to do this without assigning a new variable. ...And now there is thanks to @mnel in the comments:

dat[,if(.N>=5).SD,by=names]

This essentially returns a sub-data.table .SD for each value of the by group if the number of rows in the group .N is greater than or equal to 5. It is pretty much equivalent to the more traditional R subsetting syntax of:

dat[,.SD[.N >= 5],by=names]


来源:https://stackoverflow.com/questions/18302610/remove-ids-that-occur-x-times-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!