I have a file with some probabilities for different values e.g.:
1 0.1
2 0.05
3 0.05
4 0.2
5 0.4
6 0.2
I would like to generate random numb
based on other solutions, you generate accumulative distribution (as integer or float whatever you like), then you can use bisect to make it fast
this is a simple example (I used integers here)
l=[(20, 'foo'), (60, 'banana'), (10, 'monkey'), (10, 'monkey2')]
def get_cdf(l):
ret=[]
c=0
for i in l: c+=i[0]; ret.append((c, i[1]))
return ret
def get_random_item(cdf):
return cdf[bisect.bisect_left(cdf, (random.randint(0, cdf[-1][0]),))][1]
cdf=get_cdf(l)
for i in range(100): print get_random_item(cdf),
the get_cdf
function would convert it from 20, 60, 10, 10 into 20, 20+60, 20+60+10, 20+60+10+10
now we pick a random number up to 20+60+10+10 using random.randint
then we use bisect to get the actual value in a fast way