R data.table filtering on group size

问题

I am trying to find all the records in my data.table for which there is more than one row with value v in field f.

For instance, we can use this data:

dt <- data.table(f1=c(1,2,3,4,5), f2=c(1,1,2,3,3))

If looking for that property in field f2, we'd get (note the absence of the (3,2) tuple)

My first guess was dt[.N>2,list(.N),by=f2], but that actually keeps entries with .N==1.

dt[.N>2,list(.N),by=f2]
   f2 N
1:  1 2
2:  2 1
3:  3 2

The other easy guess, dt[duplicated(dt$f2)], doesn't do the trick, as it keeps one of the 'duplicates' out of the results.

dt[duplicated(dt$f2)]
   f1 f2
1:  2  1
2:  5  3

So how can I get this done?

Edited to add example

回答1:

The question is not clear. Based on the title, it looks like we want to extract all groups with number of rows (.N) greater than 1.

DT[, if(.N>1) .SD, by=f]

But the value v in field f is making it confusing.

回答2:

If I understand what you're after correctly, you'll need to do some compound queries:

library(data.table)
DT <- data.table(v1 = 1:10, f = c(rep(1:3, 3), 4))
DT[, N := .N, f][N > 2][, N := NULL][]
#    v1 f
# 1:  1 1
# 2:  2 2
# 3:  3 3
# 4:  4 1
# 5:  5 2
# 6:  6 3
# 7:  7 1
# 8:  8 2
# 9:  9 3

来源：https://stackoverflow.com/questions/34427383/r-data-table-filtering-on-group-size

标签

data.table

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!