问题
I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation?
NAME CLASS COLOR VALUE
antonio B YELLOW 5
antonio B BLUE 8
antonio B BLUE 7
antonio B BLUE 12
luca C YELLOW 99
luca B YELLOW 87
luca B YELLOW 98
giovanni A BLUE 48
I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain:
NAME CLASS COLOR VALUE
antonio B BLUE mean
because antonio is the only with three observations for each factor
thank you so much
Nik
回答1:
You can use the table
function as follows:
subset(df, table(FACTOR)[FACTOR] >= 3)
# FACTOR VALUE
# 1 ANTONIO 5
# 2 ANTONIO 8
# 3 ANTONIO 7
To help you understand, see what these return:
table(df$FACTOR)
table(df$FACTOR)[df$FACTOR]
table(df$FACTOR)[df$FACTOR] >= 3
You could also use the ave
function to compute the number of observations:
subset(df, ave(VALUE, FACTOR, FUN = length) >= 3)
This last method may be a little more flexible if you have multiple factors like you asked in your comment and updated question. You can do:
subset(df, ave(VALUE, NAME, CLASS, COLOR, FUN = length) >= 3)
来源:https://stackoverflow.com/questions/13777317/subset-a-factor-by-its-number-of-observation