问题
I want to delete certain rows in a data frame when the number of rows with the same index is smaller than a pre-specified value.
> fof.6.5[1:15, 1:3]
draw Fund.ID Firm.ID
1 1 1667 666
2 1 1572 622
3 1 1392 553
4 1 248 80
5 1 3223 332
6 2 2959 1998
7 2 2659 1561
8 2 14233 2517
9 2 10521 12579
10 2 3742 1045
11 3 9093 10121
12 3 15681 21626
13 3 26371 70170
14 4 27633 52720
15 4 13751 656
In this example, I want each index to have 5 rows. The third draw (which is my index) has fewer than 5 rows. How can I delete the draws like the third one if they have fewer than 5 rows?
回答1:
You could do this using dplyr
(assuming your data is in a data frame called dt
:
dt %>% group_by(draw) %>% filter(n() >= 5) %>% ungroup()
Or you could use table
or xtabs
:
tab <- xtabs(~ draw, dt)
dt[!dt$draw %in% as.numeric(names(which(tab < 5))), ]
回答2:
Here is another option using data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
, grouped by 'draw', if
the nrows (.N
) are greater than or equal to 'n' then get the Subset of Data.table (.SD
)
library(data.table)
n <- 5
setDT(df1)[, if(.N >= n) .SD, by = draw]
# draw Fund.ID Firm.ID
# 1: 1 1667 666
# 2: 1 1572 622
# 3: 1 1392 553
# 4: 1 248 80
# 5: 1 3223 332
# 6: 2 2959 1998
# 7: 2 2659 1561
# 8: 2 14233 2517
# 9: 2 10521 12579
#10: 2 3742 1045
If we want only 'n' number of rows, use the head
setDT(df1)[, if(.N >= n) head(.SD, n), by = draw]
回答3:
Code below makes that:
library(dplyr)
fof.6.5 <- data.frame(draw = rep(1:4, c(5, 5, 3, 2)),
Fun.ID = rnorm(15),
Firm.ID = rnorm(15))
fof_filter <- fof.6.5 %>% group_by(draw) %>% mutate(freq = n()) %>%
filter(freq >= 5) %>% select(-freq)
fof_filter
group_by
join (in someway) profiles with the same value of drawmutate
add a new variable "freq" that give the number of profiles per each group of draw and repeats it.filter
selects only those profiles with "freq" greater or equals to 5select
delete the "freq" column because is not needed anymore
来源:https://stackoverflow.com/questions/37497763/r-delete-rows-in-data-frame-where-nrow-of-index-is-smaller-than-certain-value