subset() a factor by its number of observation

问题

I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation?

   NAME      CLASS         COLOR   VALUE      
   antonio       B          YELLOW       5
   antonio       B          BLUE       8
   antonio       B          BLUE       7 
   antonio       B          BLUE      12 
   luca          C          YELLOW    99
   luca          B          YELLOW    87
   luca          B          YELLOW    98
   giovanni      A          BLUE      48

I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain:

   NAME      CLASS         COLOR   VALUE      
   antonio       B          BLUE       mean

because antonio is the only with three observations for each factor

thank you so much

Nik

回答1:

You can use the table function as follows:

subset(df, table(FACTOR)[FACTOR] >= 3)
#    FACTOR VALUE
# 1 ANTONIO     5
# 2 ANTONIO     8
# 3 ANTONIO     7

To help you understand, see what these return:

table(df$FACTOR)
table(df$FACTOR)[df$FACTOR]
table(df$FACTOR)[df$FACTOR] >= 3

You could also use the ave function to compute the number of observations:

subset(df, ave(VALUE, FACTOR, FUN = length) >= 3)

This last method may be a little more flexible if you have multiple factors like you asked in your comment and updated question. You can do:

subset(df, ave(VALUE, NAME, CLASS, COLOR, FUN = length) >= 3)

来源：https://stackoverflow.com/questions/13777317/subset-a-factor-by-its-number-of-observation

标签

frequency

subset

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!