R data.table filtering on group size

我只是一个虾纸丫 提交于 2020-06-28 09:03:28

问题


I am trying to find all the records in my data.table for which there is more than one row with value v in field f.

For instance, we can use this data:

dt <- data.table(f1=c(1,2,3,4,5), f2=c(1,1,2,3,3))

If looking for that property in field f2, we'd get (note the absence of the (3,2) tuple)

    f1 f2
1:  1  1
2:  2  1
3:  4  3
4:  5  3  

My first guess was dt[.N>2,list(.N),by=f2], but that actually keeps entries with .N==1.

dt[.N>2,list(.N),by=f2]
   f2 N
1:  1 2
2:  2 1
3:  3 2

The other easy guess, dt[duplicated(dt$f2)], doesn't do the trick, as it keeps one of the 'duplicates' out of the results.

dt[duplicated(dt$f2)]
   f1 f2
1:  2  1
2:  5  3

So how can I get this done?

Edited to add example


回答1:


The question is not clear. Based on the title, it looks like we want to extract all groups with number of rows (.N) greater than 1.

DT[, if(.N>1) .SD, by=f]

But the value v in field f is making it confusing.




回答2:


If I understand what you're after correctly, you'll need to do some compound queries:

library(data.table)
DT <- data.table(v1 = 1:10, f = c(rep(1:3, 3), 4))
DT[, N := .N, f][N > 2][, N := NULL][]
#    v1 f
# 1:  1 1
# 2:  2 2
# 3:  3 3
# 4:  4 1
# 5:  5 2
# 6:  6 3
# 7:  7 1
# 8:  8 2
# 9:  9 3


来源:https://stackoverflow.com/questions/34427383/r-data-table-filtering-on-group-size

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!