问题
I would like to subset my data frame to keep only groups that have 3 or more observations on DIFFERENT days. I want to get rid of groups that have less than 3 observations, or the observations they have are not from 3 different days.
Here is a sample data set:
Group Day
1 1
1 3
1 5
1 5
2 2
2 2
2 4
2 4
3 1
3 2
3 3
4 1
4 5
So for the above example, group 1 and group 3 will be kept and group 2 and 4 will be removed from the data frame.
I hope this makes sense, I imagine the solution will be quite simple but I can\'t work it out (I\'m quite new to R and not very fast at coming up with solutions to things like this). I thought maybe the diff function could come in handy but didn\'t get much further.
回答1:
With data.table you could do:
library(data.table)
DT[, if(uniqueN(Day) >= 3) .SD, by = Group]
which gives:
Group Day 1: 1 1 2: 1 3 3: 1 5 4: 1 5 5: 3 1 6: 3 2 7: 3 3
Or with dplyr
:
library(dplyr)
DT %>%
group_by(Group) %>%
filter(n_distinct(Day) >= 3)
which gives the same result.
回答2:
One idea using dplyr
library(dplyr)
df %>%
group_by(Group) %>%
filter(length(unique(Day)) >= 3)
#Source: local data frame [7 x 2]
#Groups: Group [2]
# Group Day
# (int) (int)
#1 1 1
#2 1 3
#3 1 5
#4 1 5
#5 3 1
#6 3 2
#7 3 3
回答3:
We can use base R
i1 <- rowSums(table(df1)!=0)>=3
subset(df1, Group %in% names(i1)[i1])
# Group Day
#1 1 1
#2 1 3
#3 1 5
#4 1 5
#9 3 1
#10 3 2
#11 3 3
Or a one-liner base R
would be
df1[with(df1, as.logical(ave(Day, Group, FUN = function(x) length(unique(x)) >=3))),]
来源:https://stackoverflow.com/questions/37835592/remove-groups-with-less-than-three-unique-observations