Categorize dataframe by percentile in R

限于喜欢 提交于 2020-01-06 09:07:09

问题


I have following data:

set.seed(15)
ddf <- data.frame(
    gp1 = sample(1:3, 200, replace=T), 
    gp2 = sample(c('a','b'), 200, replace=T), 
    param = sample(10:20, 200, replace=T) 
)
head(ddf)
  gp1 gp2 param
1   2   a    18
2   1   b    11
3   3   a    15
4   2   b    20
5   2   a    17
6   3   b    11

I have to create another column called 'category' which needs to have a value of 1 if 'param' for that row is more than 75th percentile for that gp1 and gp2.

I tried following but I am not sure if this is correct:

ddf$category = with(ddf, ifelse(param>quantile(ddf[ddf$gp1==gp1 & ddf$gp2==gp2,]$param, .75, na.rm=T), 1, 0)  )

Is above code correct or else how can this be done? Thanks for your help.


回答1:


(After changing "value" to "param")

ddf = data.frame(gp1, gp2, param)
ddf$category <- with(ddf, ave(param, gp1,gp2, 
                             FUN=function(x) x > quantile(x,.95) ) )
> ddf
    gp1 gp2 param category
1     2   a    20        0
2     2   a    16        0
3     1   a    12        0
4     1   b    16        0
5     3   b    19        0
 snipped

> sum(ddf$category)
[1] 2


来源:https://stackoverflow.com/questions/27084753/categorize-dataframe-by-percentile-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!