First observation by group using self-join

后端 未结 2 1762
情话喂你
情话喂你 2020-12-15 07:07

I\'m trying to get the top row by a group of three variables using a data.table.

I have a working solution:

col1 <- c(1,1,1,1,2,2,2,2,3,3,3,3)
col         


        
相关标签:
2条回答
  • 2020-12-15 07:27

    What about:

    solution2 <- data.table(data)[ , sales[1], by="store,year,month"]
    > solution2
       store year month V1
    1:     1 2000    12  1
    2:     1 2001    12  3
    3:     2 2000    12  5
    4:     2 2001    12  7
    5:     3 2000    12  9
    6:     3 2001    12 11
    

    I suppose you could rename that column:

    data.table(data)[,fsales := sales[1],by="store,year,month"]
    
    0 讨论(0)
  • 2020-12-15 07:34

    option 1 (using keys)

    Set the key to be store, year, month

    DT <- data.table(data, key = c('store','year','month'))
    

    Then you can use unique to create a data.table containing the unique values of the key columns. By default this will take the first entry

    unique(DT)
       store year month sales
    1:     1 2000    12     1
    2:     1 2001    12     3
    3:     2 2000    12     5
    4:     2 2001    12     7
    5:     3 2000    12     9
    6:     3 2001    12    11
    

    But, to be sure, you could use a self-join with mult='first'. (other options are 'all' or 'last')

    # the key(DT) subsets the key columns only, so you don't end up with two 
    # sales columns
    DT[unique(DT[,key(DT), with = FALSE]), mult = 'first']
    

    Option 2 (No keys)

    Without setting the key, it would be faster to use .I not .SD

    DTb <- data.table(data)
    DTb[DTb[,list(row1 = .I[1]), by = list(store, year, month)][,row1]]
    
    0 讨论(0)
提交回复
热议问题