问题
I am trying to find all the records in my data.table
for which there is more than one row with value v in field f.
For instance, we can use this data:
dt <- data.table(f1=c(1,2,3,4,5), f2=c(1,1,2,3,3))
If looking for that property in field f2
, we'd get (note the absence of the (3,2) tuple)
f1 f2
1: 1 1
2: 2 1
3: 4 3
4: 5 3
My first guess was dt[.N>2,list(.N),by=f2]
, but that actually keeps entries with .N==1
.
dt[.N>2,list(.N),by=f2]
f2 N
1: 1 2
2: 2 1
3: 3 2
The other easy guess, dt[duplicated(dt$f2)]
, doesn't do the trick, as it keeps one of the 'duplicates' out of the results.
dt[duplicated(dt$f2)]
f1 f2
1: 2 1
2: 5 3
So how can I get this done?
Edited to add example
回答1:
The question is not clear. Based on the title, it looks like we want to extract all groups with number of rows (.N
) greater than 1.
DT[, if(.N>1) .SD, by=f]
But the value v in field f
is making it confusing.
回答2:
If I understand what you're after correctly, you'll need to do some compound queries:
library(data.table)
DT <- data.table(v1 = 1:10, f = c(rep(1:3, 3), 4))
DT[, N := .N, f][N > 2][, N := NULL][]
# v1 f
# 1: 1 1
# 2: 2 2
# 3: 3 3
# 4: 4 1
# 5: 5 2
# 6: 6 3
# 7: 7 1
# 8: 8 2
# 9: 9 3
来源:https://stackoverflow.com/questions/34427383/r-data-table-filtering-on-group-size