发表新帖

发表新帖

How do you sample groups in a data.table with a caveat

后端未结

关注

 1  681

半阙折子戏 2021-01-13 06:02

This question is very similar to Sample random rows within each group in a data.table.

The difference is in a minor subtlety that I did not have enough reputation to

1条回答

独厮守ぢ (楼主)

2021-01-13 06:33
I might be misunderstanding your question, but are you looking for something like this?
```
set.seed(123)
##
DT <- data.table(
  a=c(1,1,1,1:15,1,1), 
  b=sample(1:1000,20))
##
R> DT[,.SD[sample(.N,min(.N,3))],by = a]
     a   b
 1:  1 288
 2:  1 881
 3:  1 409
 4:  2 937
 5:  3  46
 6:  4 525
 7:  5 887
 8:  6 548
 9:  7 453
10:  8 948
11:  9 449
12: 10 670
13: 11 566
14: 12 102
15: 13 993
16: 14 243
17: 15  42
```
where we are drawing 3 samples from b for group a_i if a_i contains three or more values, else we draw only n values, where n (n < 3) is the size of group a_i.

Just for demonstration, here are the 6 possible values of b for a=1 that we are sampling from (assuming you use the same random seed as above):
```
R> DT[order(a)][1:6,]
   a   b
1: 1 288
2: 1 788
3: 1 409
4: 1 881
5: 1 323
6: 1 996
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题