replace median for category by condition of three zero before and three after separated by groups in R

烂漫一生 提交于 2019-12-12 10:18:29

问题


Say, i have dataset

 mydat=structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "25481МСК", class = "factor"), 
    item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
    13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
    13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
    13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 
    13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 
    13164L, 13164L, 13164L, 13164L, 13164L, 13164L), sales = c(4L, 
    1L, 10L, 6L, 8L, 3L, 11L, 6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L, 
    4L, 15L, 10L, 6L, 6L, 5L, 4L, 4L, 1L, 10L, 6L, 8L, 3L, 11L, 
    6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L, 4L, 15L, 10L, 6L, 6L, 5L, 
    4L), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
    0L, 0L, 0L)), .Names = c("code", "item", "sales", "action"
), class = "data.frame", row.names = c(NA, -44L))

I have 2 groups vars code+item. Here two groups:

25481МСК    13163
25480МСК    13164

Also i have action column. It can have only two values zero(0) or one(1). I need to calculate the median by three preceding zeros category by action column, i.e. which go before one category of action column, and by three zeros by action column that go after the one category.

Here example

sales   action  output
2          0    2
4          0    4
3          0    3
10         1    **5**
4          1    **5**
15         1    **5**
10         0    10
6          0    6
6          0    6

median=(2,4,3),(10,6,6)=5

so median by zeros category before one and after one =5, then replace ones(1) by action by this median. i.e. the one category that is inside these zeros. Because, as can be seen from the example, there are other ones inside zeros.The same principle must be applied to them. BUT, if median is more than the sales, then do not replace it.

I.E. suppose

sales   action
10       1
5        1
14       1

and median by zero is 12, so in this case output would be

output
10
5
12

only 14 must be replaced, cause it more then median.

in real case

sales   action  output
2          0    2
4          0    4
3          0    3
10         1    **5**
4          1    **4**
15         1    **5**
10         0    10
6          0    6
6          0    6

It should be done for each group separately.

25481МСК    13163
25480МСК    13164

The desired output

 code        item sales action output
1  25481МСК 13163     4      0      4
2  25481МСК 13163     1      0      1
3  25481МСК 13163    10      0     10
4  25481МСК 13163     6      0      6
5  25481МСК 13163     8      0      8
6  25481МСК 13163     3      0      3
7  25481МСК 13163    11      0     11
8  25481МСК 13163     6      0      6
9  25481МСК 13163     4      0      4
10 25481МСК 13163     2      0      2
11 25481МСК 13163     4      0      4
12 25481МСК 13163     2      0      2
13 25481МСК 13163     4      0      4
14 25481МСК 13163     3      0      3
15 25481МСК 13163    10      1      5
16 25481МСК 13163     4      1      5
17 25481МСК 13163    15      1      5
18 25481МСК 13163    10      0     10
19 25481МСК 13163     6      0      6
20 25481МСК 13163     6      0      6
21 25481МСК 13163     5      0      5
22 25481МСК 13163     4      0      4
23 25481МСК 13164     4      0      4
24 25481МСК 13164     1      0      1
25 25481МСК 13164    10      0     10
26 25481МСК 13164     6      0      6
27 25481МСК 13164     8      0      8
28 25481МСК 13164     3      0      3
29 25481МСК 13164    11      0     11
30 25481МСК 13164     6      0      6
31 25481МСК 13164     4      0      4
32 25481МСК 13164     2      0      2
33 25481МСК 13164     4      0      4
34 25481МСК 13164     2      0      2
35 25481МСК 13164     4      0      4
36 25481МСК 13164     3      0      3
37 25481МСК 13164    10      1      5
38 25481МСК 13164     4      1      5
39 25481МСК 13164    15      1      5
40 25481МСК 13164    10      0     10
41 25481МСК 13164     6      0      6
42 25481МСК 13164     6      0      6
43 25481МСК 13164     5      0      5
44 25481МСК 13164     4      0      4

Note that value of sales column for action=0 also should be in the output column. How perform it?

P.S. Please, do not pay attention to that there are medians in this output that more then sales. It's just test.

for Eric

code    item    sales   action  output
52382МСК    11709   1   0   1
52382МСК    11709   10  1   NA
52382МСК    11709   1   0   1
52382МСК    11709   3   0   3

回答1:


I think this gets near a solution? (to be honest, I'm not sure I fully understand the question)

library(dplyr)

replacements <- 
  data_frame(
    action1      = which(mydat$action == 1L),
    group        = rep(1:length(action1), each = 3, length.out = length(action1)),
    sales1       = mydat$sales[action1],
    sales_before = mydat$sales[action1 - 3L],
    sales_after  = mydat$sales[action1 + 3L]
  ) %>%
  group_by(group) %>%
  mutate(
    med   = median(c(sales_before, sales_after)),
    output = pmin(sales1, med)
  )

mydat$output <- mydat$sales
mydat$output[replacements$action1] <- replacements$output

mydat



回答2:


If I understand correctly, the OP wants to compare the sales figure during a sales action with sales figures before and after the sales action for a specific product (code, item).

The expected output is the sales figures on zero action days. On action days, this figure is to be replaced by the median sales of the surrounding zero action days but only if it is less than actual sales figure.

The duration of each sales action is given by each streak of contiguous 1s in the active column. The median sales figure is to be computed for the 3 zero action days before and after, resp.

With the function below

sales_action <- function(DF, zeros_before, zeros_after) {
  library(data.table)
  library(magrittr)
  action_pattern <- 
    do.call(sprintf, 
            c(fmt = "%s1+(?=%s)", 
              stringr::str_dup("0", c(zeros_before, zeros_after)) %>% as.list()
            ))
  message("Action pattern used: ", action_pattern)
  setDT(DF)[, rn := .I]
  tmp <- DF[, paste(action, collapse = "") %>% 
              stringr::str_locate_all(action_pattern) %>% 
              as.data.table() %>% 
              lapply(function(x) rn[x]),
            by = .(code, item)][
              , end := end + zeros_after]
  DF[tmp, on = .(code, item, rn >= start, rn <= end), 
     med := as.double(median(sales[action == 0])), by = .EACHI][
       , output := as.double(sales)][action == 1, output := pmin(sales, med)][
         , c("rn", "med") := NULL][]
}

we get for the sample dataset:

sales_action(mydat, 3L, 3L)
Action pattern used: 0001+(?=000)
        code  item sales action output
 1: 25481MCK 13163     4      0      4
 2: 25481MCK 13163     1      0      1
 3: 25481MCK 13163    10      0     10
 4: 25481MCK 13163     6      0      6
 5: 25481MCK 13163     8      0      8
 6: 25481MCK 13163     3      0      3
 7: 25481MCK 13163    11      0     11
 8: 25481MCK 13163     6      0      6
 9: 25481MCK 13163     4      0      4
10: 25481MCK 13163     2      0      2
11: 25481MCK 13163     4      0      4
12: 25481MCK 13163     2      0      2
13: 25481MCK 13163     4      0      4
14: 25481MCK 13163     3      0      3
15: 25481MCK 13163    10      1      5
16: 25481MCK 13163     4      1      4
17: 25481MCK 13163    15      1      5
18: 25481MCK 13163    10      0     10
19: 25481MCK 13163     6      0      6
20: 25481MCK 13163     6      0      6
21: 25481MCK 13163     5      0      5
22: 25481MCK 13163     4      0      4
23: 25481MCK 13164     4      0      4
24: 25481MCK 13164     1      0      1
25: 25481MCK 13164    10      0     10
26: 25481MCK 13164     6      0      6
27: 25481MCK 13164     8      0      8
28: 25481MCK 13164     3      0      3
29: 25481MCK 13164    11      0     11
30: 25481MCK 13164     6      0      6
31: 25481MCK 13164     4      0      4
32: 25481MCK 13164     2      0      2
33: 25481MCK 13164     4      0      4
34: 25481MCK 13164     2      0      2
35: 25481MCK 13164     4      0      4
36: 25481MCK 13164     3      0      3
37: 25481MCK 13164    10      1      5
38: 25481MCK 13164     4      1      4
39: 25481MCK 13164    15      1      5
40: 25481MCK 13164    10      0     10
41: 25481MCK 13164     6      0      6
42: 25481MCK 13164     6      0      6
43: 25481MCK 13164     5      0      5
44: 25481MCK 13164     4      0      4
        code  item sales action output

which is in line with OP's expectations.

Please, note that OP's desired output shown in the question is incomplete as the OP has not replaced the medians by the actual sales in rows 16 and 38 as they should have been according to OP's own rules.

For an explanation of the function please see here.



来源:https://stackoverflow.com/questions/51876158/replace-median-for-category-by-condition-of-three-zero-before-and-three-after-se

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!