How do you sample groups in a data.table with a caveat

后端 未结 1 681
半阙折子戏
半阙折子戏 2021-01-13 06:02

This question is very similar to Sample random rows within each group in a data.table.

The difference is in a minor subtlety that I did not have enough reputation to

1条回答
  •  独厮守ぢ
    2021-01-13 06:33

    I might be misunderstanding your question, but are you looking for something like this?

    set.seed(123)
    ##
    DT <- data.table(
      a=c(1,1,1,1:15,1,1), 
      b=sample(1:1000,20))
    ##
    R> DT[,.SD[sample(.N,min(.N,3))],by = a]
         a   b
     1:  1 288
     2:  1 881
     3:  1 409
     4:  2 937
     5:  3  46
     6:  4 525
     7:  5 887
     8:  6 548
     9:  7 453
    10:  8 948
    11:  9 449
    12: 10 670
    13: 11 566
    14: 12 102
    15: 13 993
    16: 14 243
    17: 15  42
    

    where we are drawing 3 samples from b for group a_i if a_i contains three or more values, else we draw only n values, where n (n < 3) is the size of group a_i.

    Just for demonstration, here are the 6 possible values of b for a=1 that we are sampling from (assuming you use the same random seed as above):

    R> DT[order(a)][1:6,]
       a   b
    1: 1 288
    2: 1 788
    3: 1 409
    4: 1 881
    5: 1 323
    6: 1 996
    

    0 讨论(0)
提交回复
热议问题