find 75 percentile and replacing by median for each group in R

白昼怎懂夜的黑 提交于 2019-12-24 01:25:17

问题


These problem similar with this my own topic calculation of 90 percentile and replacement of it by median by groups in R

With this distinction that.

But, in that topic Note the calculation is done by 14 zeros preceding the one category of action but replacing by median is done for all zero category of action and performing for each groups code+item

namely ,now i use all zeros and not 14 preceding and don't touch negative and zero values of return.

By group variable (action- 0, 1) for Zero category, i want find 75 percentile by return variable and if value is more than 75 percentile, it must be replaced on median by zero category. So there is code variable This procedure must be performed for code separately. Note: negative and zero value i don't touch

mydat=structure(list(code = c(123L, 123L, 123L, 123L, 123L, 123L, 123L, 
123L, 123L, 123L, 123L, 123L, 124L, 124L, 124L, 124L, 124L, 124L, 
124L, 124L, 124L, 124L, 124L, 124L), action = c(0L, 0L, 0L, 0L, 
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 1L, 1L, 1L), return = c(-1L, 0L, 23L, 100L, 18L, 15L, -1L, 
0L, 23L, 100L, 18L, 15L, -1L, 0L, 23L, 100L, 18L, 15L, -1L, 0L, 
23L, 100L, 18L, 15L)), .Names = c("code", "action", "return"), class = "data.frame", row.names = c(NA, 
-24L))

\

23
100
18
15

How to do it to get that output. so 75 percentile:

42,25 The median=20,5 replacement

 add  action   return
123   0    -1
123   0    0
123   0    23
123   0    ***20,5
123   0    18
123   0    15
123   1  -1
123   1  0
123   1  23
123   1  100
123   1  18
123   1  15
124   0    -1
124   0    0
124   0    23
124   0    ***20,5
124   0    18
124   0    15
124   1  -1
124   1  0
124   1  23
124   1  100
124   1  18
124   1  15

Using the greatest Uwe solution, i get the error

Error in `[.data.table`(mydat[action == 0, `:=`(output, as.double(return))],  : 
  Column(s) [action] not found in i

How to do that negative and zero value i don't touch and why this error occured.

library(data.table)
# mark the zero acton rows before the the action period
setDT(mydat)[, zero_before := cummax(action), by = .(code)]
# compute median and 90% quantile for that last 14 rows before each action period 
agg <- mydat[zero_before == 0, 
             quantile(tail(return), c(0.5, 0.75)) %>% 
               as.list()  %>% 
               set_names(c("med", "q90")) %>% 
               c(.(zero_before = 0)), by = .(code)]
agg


# append output column
mydat[action == 0, output := as.double(return)][
  # replace output values greater q90 in an update non-equi join
  agg, on = .(code,action, return > q90), output := as.double(med)][
    # remove helper column
    , zero_before := NULL]

回答1:


If I understand correctly, the OP wants to compute median and 75% quantile of return within each group based on all zero action rows where the return is greater 0. Then, any return value in a zero action row which exceeds the 75% quantile of the respective group is to be replaced by the group median.

The code can be largely simplified as we do not have to distinghuish between zero action rows before and after the action rows.

The code below reproduces the expected result:

library(data.table)
library(magrittr)
# compute median and 90% quantile for that last 14 rows before each action period 
agg <- setDT(mydat)[action == 0 & return > 0, 
                    quantile(return, c(0.5, 0.75)) %>% 
                      as.list()  %>% 
                      set_names(c("med", "q75")), by = .(code, action)]

# append output column
mydat[, output := as.double(return)][
  # replace output values greater q75 in an update non-equi join
  agg, on = .(code, action, return > q75), output := as.double(med)]
mydat[]
    code action return output
 1:  123      0     -1   -1.0
 2:  123      0      0    0.0
 3:  123      0     23   23.0
 4:  123      0    100   20.5
 5:  123      0     18   18.0
 6:  123      0     15   15.0
 7:  123      1     -1   -1.0
 8:  123      1      0    0.0
 9:  123      1     23   23.0
10:  123      1    100  100.0
11:  123      1     18   18.0
12:  123      1     15   15.0
13:  124      0     -1   -1.0
14:  124      0      0    0.0
15:  124      0     23   23.0
16:  124      0    100   20.5
17:  124      0     18   18.0
18:  124      0     15   15.0
19:  124      1     -1   -1.0
20:  124      1      0    0.0
21:  124      1     23   23.0
22:  124      1    100  100.0
23:  124      1     18   18.0
24:  124      1     15   15.0
    code action return output


来源:https://stackoverflow.com/questions/52074105/find-75-percentile-and-replacing-by-median-for-each-group-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!