I want to delete certain rows in a data frame when the number of rows with the same index is smaller than a pre-specified value.
> fof.6.5[1:15, 1:3]
draw
You could do this using dplyr (assuming your data is in a data frame called dt:
dt %>% group_by(draw) %>% filter(n() >= 5) %>% ungroup()
Or you could use table or xtabs:
tab <- xtabs(~ draw, dt)
dt[!dt$draw %in% as.numeric(names(which(tab < 5))), ]
Here is another option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'draw', if the nrows (.N) are greater than or equal to 'n' then get the Subset of Data.table (.SD)
library(data.table)
n <- 5
setDT(df1)[, if(.N >= n) .SD, by = draw]
# draw Fund.ID Firm.ID
# 1: 1 1667 666
# 2: 1 1572 622
# 3: 1 1392 553
# 4: 1 248 80
# 5: 1 3223 332
# 6: 2 2959 1998
# 7: 2 2659 1561
# 8: 2 14233 2517
# 9: 2 10521 12579
#10: 2 3742 1045
If we want only 'n' number of rows, use the head
setDT(df1)[, if(.N >= n) head(.SD, n), by = draw]
Code below makes that:
library(dplyr)
fof.6.5 <- data.frame(draw = rep(1:4, c(5, 5, 3, 2)),
Fun.ID = rnorm(15),
Firm.ID = rnorm(15))
fof_filter <- fof.6.5 %>% group_by(draw) %>% mutate(freq = n()) %>%
filter(freq >= 5) %>% select(-freq)
fof_filter
group_by join (in someway) profiles with the same value of drawmutate add a new variable "freq" that give the number of profiles
per each group of draw and repeats it. filter selects only those
profiles with "freq" greater or equals to 5 select delete the "freq"
column because is not needed anymore