I have a data set customerId, transactionDate, productId, purchaseQty loaded into a data.table. for each row, I want to calculate the sum, and mean of purchaseQty for the pr
This also works, it could be considered simpler. It has the advantage of not requiring a sorted input set, and has fewer dependencies.
I still don't know understand why it produces 2 transactionDate columns in the output. This seems to be a byproduct of the "on" clause. In fact, columns and order of the output seems to append the sum after all elements of the on clause, without their alias names
DT[.(p=productId, c=customerID, tmin=transactionDate - 45, tmax=transactionDate),
on = .(productId==p, customerID==c, transactionDate<=tmax, transactionDate>=tmin),
.(windowSum = sum(purchaseQty)), by = .EACHI, nomatch = 0]