How to do weighted random sample of categories in python

后端未结

关注

 9  2277

不要未来只要你来 2021-01-31 17:32

Given a list of tuples where each tuple consists of a probability and an item I\'d like to sample an item according to its probability. For example, give the list [ (.3, \'a\'),

9条回答

暖寄归人 (楼主)

2021-01-31 18:18
Just inspired of sholte's very straightforward (and correct) answer: I'll just demonstrate how easy it will be to extend it to handle arbitrary items, like:
```
In []: s= array([.3, .4, .3]).cumsum().searchsorted(sample(54))
In []: c, _= histogram(s, bins= arange(4))
In []: [item* c[i] for i, item in enumerate('abc')]
Out[]: ['aaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbbbb', 'cccccccccccccccc']
```
Update:
Based on the feedback of phant0m, it turns out that an even more straightforward solution can be implemented based on multinomial, like:
```
In []: s= multinomial(54, [.3, .4, .3])
In []: [item* s[i] for i, item in enumerate('abc')]
Out[]: ['aaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbbbbb', 'cccccccccccc']
```
IMHO here we have a nice summary of empirical cdf and multinomial based sampling yielding similar results. So, in a summary, pick it up one which suits best for your purposes.
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...