I have data like below:
ID category class
1 a m
1 a s
1 b s
2 a m
3 b s
4 c s
5 d s
I want to subset the data by only including those "ID" which have several (> 1
) different categories.
My expected output:
ID category class
1 a m
1 a s
1 b s
Is there a way to doing so?
I tried
library(dplyr)
df %>%
group_by(ID) %>%
filter(n_distinct(category, class) > 1)
But it gave me an error:
# Error: expecting a single value
Using data.table
library(data.table) #see: https://github.com/Rdatatable/data.table/wiki for more
setDT(data) #convert to native 'data.table' type by reference
data[ , if(uniqueN(category) > 1) .SD, by = ID]
uniqueN
is data.table
's (fast) native mask for length(unique())
, and .SD
is just the whole data.table
(in more general cases, it can represent a subset of columns, e.g. when the .SDcols
argument is activated). So basically the middle statement (j
, the column selection argument) says to return all columns and rows associated with an ID
for which there are at least two distinct values of category
.
Use the by
argument to extend to a case involving counts ok multiple columns.
来源:https://stackoverflow.com/questions/33291658/select-groups-with-more-than-one-distinct-value-per-group