Identify a value changes' date and summarize the data with sum() and diff() in R

邮差的信 提交于 2019-12-01 08:53:05

Using data.table:

library(data.table)
setDT(sampleData)

Some Preprocessing:

sampleData[, firstdate := as.Date(date, "%m/%d/%y")]

Based on how you calculate date diff, we are better off creating a range of dates for each row:

sampleData[, lastdate := shift(firstdate,type = "lead"), by = product_id]
sampleData[is.na(lastdate), lastdate := firstdate]
# Arun's one step: sampleData[, lastdate := shift(firstdate, type="lead", fill=firstdate[.N]), by = product_id]

Then create a new ID for every change in price:

sampleData[, price_id := cumsum(c(0,diff(price) != 0)), by = product_id]

Then calculate your groupwise functions, by product and price run:

sampleData[,
           .(
             price = unique(price),
             sum_qty = sum(qty_ordered),
             date_diff = max(lastdate) − min(firstdate) 
           ),
           by = .(
             product_id,
             price_id
           )
           ]

   product_id price_id price sum_qty date_diff
1:       1000        0 2.490       4   21 days
2:       1000        1 1.743       1   61 days
3:       1000        2 2.490       2   33 days
4:       1002        0 2.093       3   28 days
5:       1002        1 2.110       4   31 days
6:       1002        2 2.970       1    0 days

I think the last price change for 1000 is only 33 days, and the preceding one is 61 (not 60). If you include the first day it is 22, 62 and 34, and the line should read date_diff = max(lastdate) − min(firstdate) + 1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!