Identify a value changes' date and summarize the data with sum() and diff() in R

前端 未结 1 1266
故里飘歌
故里飘歌 2021-01-06 12:48

Sample Data:

 product_id <- c(\"1000\",\"1000\",\"1000\",\"1000\",\"1000\",\"1000\", \"1002\",\"1002\",\"1002\",\"1002\",\"1002\",\"1002\")
    qty_orde         


        
相关标签:
1条回答
  • 2021-01-06 13:05

    Using data.table:

    library(data.table)
    setDT(sampleData)
    

    Some Preprocessing:

    sampleData[, firstdate := as.Date(date, "%m/%d/%y")]
    

    Based on how you calculate date diff, we are better off creating a range of dates for each row:

    sampleData[, lastdate := shift(firstdate,type = "lead"), by = product_id]
    sampleData[is.na(lastdate), lastdate := firstdate]
    # Arun's one step: sampleData[, lastdate := shift(firstdate, type="lead", fill=firstdate[.N]), by = product_id]
    

    Then create a new ID for every change in price:

    sampleData[, price_id := cumsum(c(0,diff(price) != 0)), by = product_id]
    

    Then calculate your groupwise functions, by product and price run:

    sampleData[,
               .(
                 price = unique(price),
                 sum_qty = sum(qty_ordered),
                 date_diff = max(lastdate) − min(firstdate) 
               ),
               by = .(
                 product_id,
                 price_id
               )
               ]
    
       product_id price_id price sum_qty date_diff
    1:       1000        0 2.490       4   21 days
    2:       1000        1 1.743       1   61 days
    3:       1000        2 2.490       2   33 days
    4:       1002        0 2.093       3   28 days
    5:       1002        1 2.110       4   31 days
    6:       1002        2 2.970       1    0 days
    

    I think the last price change for 1000 is only 33 days, and the preceding one is 61 (not 60). If you include the first day it is 22, 62 and 34, and the line should read date_diff = max(lastdate) − min(firstdate) + 1

    0 讨论(0)
提交回复
热议问题