Get the last row of a previous group in data.table

别来无恙 提交于 2020-01-22 09:50:29

问题


This is what my data table looks like:

library(data.table)
dt <- fread('
    Product  Group    LastProductOfPriorGroup
    A          1          NA
    B          1          NA
    C          2          B
    D          2          B
    E          2          B
    F          3          E
    G          3          E
')

The LastProductOfPriorGroup column is my desired column. I am trying to fetch the product from last row of the prior group. So in the first two rows, there are no prior groups and therefore it is NA. In the third row, the product in the last row of the prior group 1 is B. I am trying to accomplish this by

dt[,LastGroupProduct:= shift(Product,1), by=shift(Group,1)]

to no avail.


回答1:


You could do

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

   Product Group LastProductOfPriorGroup newcol
1:       A     1                      NA     NA
2:       B     1                      NA     NA
3:       C     2                       B      B
4:       D     2                       B      B
5:       E     2                       B      B
6:       F     3                       E      E
7:       G     3                       E      E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

  • dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
  • shift(...) shifts the character vector in the previous call
  • dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v] 



回答2:


Another way is to save the last group's value in a variable.

this = NA_character_    # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
   Product Group LastProductOfPriorGroup
1:       A     1                      NA
2:       B     1                      NA
3:       C     2                       B
4:       D     2                       B
5:       E     2                       B
6:       F     3                       E
7:       G     3                       E

NB: last() is a data.table function which returns the last item of a vector (of the Product column in this case).

This should also be fast since no logic is being invoked to fetch the last group's value; it just relies on the groups running in order (which they do).



来源:https://stackoverflow.com/questions/38285789/get-the-last-row-of-a-previous-group-in-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!