How to do weighted random sample of categories in python

后端 未结 9 2277

Given a list of tuples where each tuple consists of a probability and an item I\'d like to sample an item according to its probability. For example, give the list [ (.3, \'a\'),

9条回答
  •  暖寄归人
    2021-01-31 18:18

    Just inspired of sholte's very straightforward (and correct) answer: I'll just demonstrate how easy it will be to extend it to handle arbitrary items, like:

    In []: s= array([.3, .4, .3]).cumsum().searchsorted(sample(54))
    In []: c, _= histogram(s, bins= arange(4))
    In []: [item* c[i] for i, item in enumerate('abc')]
    Out[]: ['aaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbbbb', 'cccccccccccccccc']
    

    Update:
    Based on the feedback of phant0m, it turns out that an even more straightforward solution can be implemented based on multinomial, like:

    In []: s= multinomial(54, [.3, .4, .3])
    In []: [item* s[i] for i, item in enumerate('abc')]
    Out[]: ['aaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbbbbb', 'cccccccccccc']
    

    IMHO here we have a nice summary of empirical cdf and multinomial based sampling yielding similar results. So, in a summary, pick it up one which suits best for your purposes.

提交回复
热议问题