问题
Say, i have dataset
mydat=structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "25481МСК", class = "factor"),
item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L,
13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L,
13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L,
13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L,
13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L,
13164L, 13164L, 13164L, 13164L, 13164L, 13164L), sales = c(4L,
1L, 10L, 6L, 8L, 3L, 11L, 6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L,
4L, 15L, 10L, 6L, 6L, 5L, 4L, 4L, 1L, 10L, 6L, 8L, 3L, 11L,
6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L, 4L, 15L, 10L, 6L, 6L, 5L,
4L), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 0L)), .Names = c("code", "item", "sales", "action"
), class = "data.frame", row.names = c(NA, -44L))
I have 2 groups vars code+item. Here two groups:
25481МСК 13163
25480МСК 13164
Also i have action column. It can have only two values zero(0) or one(1). I need to calculate the median by three preceding zeros category by action column, i.e. which go before one category of action column, and by three zeros by action column that go after the one category.
Here example
sales action output
2 0 2
4 0 4
3 0 3
10 1 **5**
4 1 **5**
15 1 **5**
10 0 10
6 0 6
6 0 6
median=(2,4,3),(10,6,6)=5
so median by zeros category before one and after one =5, then replace ones(1) by action by this median. i.e. the one category that is inside these zeros. Because, as can be seen from the example, there are other ones inside zeros.The same principle must be applied to them. BUT, if median is more than the sales, then do not replace it.
I.E. suppose
sales action
10 1
5 1
14 1
and median by zero is 12, so in this case output would be
output
10
5
12
only 14 must be replaced, cause it more then median.
in real case
sales action output
2 0 2
4 0 4
3 0 3
10 1 **5**
4 1 **4**
15 1 **5**
10 0 10
6 0 6
6 0 6
It should be done for each group separately.
25481МСК 13163
25480МСК 13164
The desired output
code item sales action output
1 25481МСК 13163 4 0 4
2 25481МСК 13163 1 0 1
3 25481МСК 13163 10 0 10
4 25481МСК 13163 6 0 6
5 25481МСК 13163 8 0 8
6 25481МСК 13163 3 0 3
7 25481МСК 13163 11 0 11
8 25481МСК 13163 6 0 6
9 25481МСК 13163 4 0 4
10 25481МСК 13163 2 0 2
11 25481МСК 13163 4 0 4
12 25481МСК 13163 2 0 2
13 25481МСК 13163 4 0 4
14 25481МСК 13163 3 0 3
15 25481МСК 13163 10 1 5
16 25481МСК 13163 4 1 5
17 25481МСК 13163 15 1 5
18 25481МСК 13163 10 0 10
19 25481МСК 13163 6 0 6
20 25481МСК 13163 6 0 6
21 25481МСК 13163 5 0 5
22 25481МСК 13163 4 0 4
23 25481МСК 13164 4 0 4
24 25481МСК 13164 1 0 1
25 25481МСК 13164 10 0 10
26 25481МСК 13164 6 0 6
27 25481МСК 13164 8 0 8
28 25481МСК 13164 3 0 3
29 25481МСК 13164 11 0 11
30 25481МСК 13164 6 0 6
31 25481МСК 13164 4 0 4
32 25481МСК 13164 2 0 2
33 25481МСК 13164 4 0 4
34 25481МСК 13164 2 0 2
35 25481МСК 13164 4 0 4
36 25481МСК 13164 3 0 3
37 25481МСК 13164 10 1 5
38 25481МСК 13164 4 1 5
39 25481МСК 13164 15 1 5
40 25481МСК 13164 10 0 10
41 25481МСК 13164 6 0 6
42 25481МСК 13164 6 0 6
43 25481МСК 13164 5 0 5
44 25481МСК 13164 4 0 4
Note that value of sales column for action=0 also should be in the output column. How perform it?
P.S. Please, do not pay attention to that there are medians in this output that more then sales. It's just test.
for Eric
code item sales action output
52382МСК 11709 1 0 1
52382МСК 11709 10 1 NA
52382МСК 11709 1 0 1
52382МСК 11709 3 0 3
回答1:
I think this gets near a solution? (to be honest, I'm not sure I fully understand the question)
library(dplyr)
replacements <-
data_frame(
action1 = which(mydat$action == 1L),
group = rep(1:length(action1), each = 3, length.out = length(action1)),
sales1 = mydat$sales[action1],
sales_before = mydat$sales[action1 - 3L],
sales_after = mydat$sales[action1 + 3L]
) %>%
group_by(group) %>%
mutate(
med = median(c(sales_before, sales_after)),
output = pmin(sales1, med)
)
mydat$output <- mydat$sales
mydat$output[replacements$action1] <- replacements$output
mydat
回答2:
If I understand correctly, the OP wants to compare the sales figure during a sales action with sales figures before and after the sales action for a specific product (code
, item
).
The expected output is the sales figures on zero action days. On action days, this figure is to be replaced by the median sales of the surrounding zero action days but only if it is less than actual sales figure.
The duration of each sales action is given by each streak of contiguous 1
s in the active
column. The median sales figure is to be computed for the 3 zero action days before and after, resp.
With the function below
sales_action <- function(DF, zeros_before, zeros_after) {
library(data.table)
library(magrittr)
action_pattern <-
do.call(sprintf,
c(fmt = "%s1+(?=%s)",
stringr::str_dup("0", c(zeros_before, zeros_after)) %>% as.list()
))
message("Action pattern used: ", action_pattern)
setDT(DF)[, rn := .I]
tmp <- DF[, paste(action, collapse = "") %>%
stringr::str_locate_all(action_pattern) %>%
as.data.table() %>%
lapply(function(x) rn[x]),
by = .(code, item)][
, end := end + zeros_after]
DF[tmp, on = .(code, item, rn >= start, rn <= end),
med := as.double(median(sales[action == 0])), by = .EACHI][
, output := as.double(sales)][action == 1, output := pmin(sales, med)][
, c("rn", "med") := NULL][]
}
we get for the sample dataset:
sales_action(mydat, 3L, 3L)
Action pattern used: 0001+(?=000) code item sales action output 1: 25481MCK 13163 4 0 4 2: 25481MCK 13163 1 0 1 3: 25481MCK 13163 10 0 10 4: 25481MCK 13163 6 0 6 5: 25481MCK 13163 8 0 8 6: 25481MCK 13163 3 0 3 7: 25481MCK 13163 11 0 11 8: 25481MCK 13163 6 0 6 9: 25481MCK 13163 4 0 4 10: 25481MCK 13163 2 0 2 11: 25481MCK 13163 4 0 4 12: 25481MCK 13163 2 0 2 13: 25481MCK 13163 4 0 4 14: 25481MCK 13163 3 0 3 15: 25481MCK 13163 10 1 5 16: 25481MCK 13163 4 1 4 17: 25481MCK 13163 15 1 5 18: 25481MCK 13163 10 0 10 19: 25481MCK 13163 6 0 6 20: 25481MCK 13163 6 0 6 21: 25481MCK 13163 5 0 5 22: 25481MCK 13163 4 0 4 23: 25481MCK 13164 4 0 4 24: 25481MCK 13164 1 0 1 25: 25481MCK 13164 10 0 10 26: 25481MCK 13164 6 0 6 27: 25481MCK 13164 8 0 8 28: 25481MCK 13164 3 0 3 29: 25481MCK 13164 11 0 11 30: 25481MCK 13164 6 0 6 31: 25481MCK 13164 4 0 4 32: 25481MCK 13164 2 0 2 33: 25481MCK 13164 4 0 4 34: 25481MCK 13164 2 0 2 35: 25481MCK 13164 4 0 4 36: 25481MCK 13164 3 0 3 37: 25481MCK 13164 10 1 5 38: 25481MCK 13164 4 1 4 39: 25481MCK 13164 15 1 5 40: 25481MCK 13164 10 0 10 41: 25481MCK 13164 6 0 6 42: 25481MCK 13164 6 0 6 43: 25481MCK 13164 5 0 5 44: 25481MCK 13164 4 0 4 code item sales action output
which is in line with OP's expectations.
Please, note that OP's desired output shown in the question is incomplete as the OP has not replaced the medians by the actual sales in rows 16 and 38 as they should have been according to OP's own rules.
For an explanation of the function please see here.
来源:https://stackoverflow.com/questions/51876158/replace-median-for-category-by-condition-of-three-zero-before-and-three-after-se