Sample Data:
product_id <- c(\"1000\",\"1000\",\"1000\",\"1000\",\"1000\",\"1000\", \"1002\",\"1002\",\"1002\",\"1002\",\"1002\",\"1002\")
qty_orde
Using data.table
:
library(data.table)
setDT(sampleData)
Some Preprocessing:
sampleData[, firstdate := as.Date(date, "%m/%d/%y")]
Based on how you calculate date diff, we are better off creating a range of dates for each row:
sampleData[, lastdate := shift(firstdate,type = "lead"), by = product_id]
sampleData[is.na(lastdate), lastdate := firstdate]
# Arun's one step: sampleData[, lastdate := shift(firstdate, type="lead", fill=firstdate[.N]), by = product_id]
Then create a new ID for every change in price:
sampleData[, price_id := cumsum(c(0,diff(price) != 0)), by = product_id]
Then calculate your groupwise functions, by product and price run:
sampleData[,
.(
price = unique(price),
sum_qty = sum(qty_ordered),
date_diff = max(lastdate) − min(firstdate)
),
by = .(
product_id,
price_id
)
]
product_id price_id price sum_qty date_diff
1: 1000 0 2.490 4 21 days
2: 1000 1 1.743 1 61 days
3: 1000 2 2.490 2 33 days
4: 1002 0 2.093 3 28 days
5: 1002 1 2.110 4 31 days
6: 1002 2 2.970 1 0 days
I think the last price change for 1000
is only 33 days, and the preceding one is 61 (not 60). If you include the first day it is 22, 62 and 34, and the line should read date_diff = max(lastdate) − min(firstdate) + 1