问题
I want to select a subset of rows based upon a condition after grouping by a specific column in R data table . Take Mtcars data for an example .
dt_mtcars <- as.data.table(mtcars)
dt_mtcars[,.N,by=.(hp)]
hp N
1: 110 3
2: 93 1
3: 175 3
4: 105 1
5: 245 2
6: 62 1
7: 95 1
8: 123 2
9: 180 3
10: 205 1
11: 215 1
12: 230 1
13: 66 2
14: 52 1
15: 65 1
16: 97 1
17: 150 2
18: 91 1
19: 113 1
20: 264 1
21: 335 1
22: 109 1
hp N
Now what I am trying to achieve is like selecting a sub set of data based on the count for e.g in here I want to have all the data in dt_mtcars where N >1 if I try something like this (which is conceptually what I am trying to achieve) it shows all the rows rather than getting the rows where N is greater than 1 .
dt_mtcars[.N>1,.SD,by=.(hp)]
Please assist .
回答1:
We can do this using if
dt_mtcars[, if(.N>1) .SD ,hp]
回答2:
You can do it in two steps:
dt_mtcars[,Count:=.N, by=.(hp)][Count>1, .SD, .SDcols = rev(rev(1:length(names(dt_mtcars)))[-1])]
EDIT: realized mtcars does not have just those two rows, updated SDcols
EDIT: Fixed to remove count
来源:https://stackoverflow.com/questions/34233417/how-do-i-select-a-subset-of-rows-after-group-by-a-specific-column-in-r-data-tabl