Generate random numbers with a given (numerical) distribution

前端 未结 13 2210
我寻月下人不归
我寻月下人不归 2020-11-22 11:18

I have a file with some probabilities for different values e.g.:

1 0.1
2 0.05
3 0.05
4 0.2
5 0.4
6 0.2

I would like to generate random numb

13条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-22 11:37

    None of these answers is particularly clear or simple.

    Here is a clear, simple method that is guaranteed to work.

    accumulate_normalize_probabilities takes a dictionary p that maps symbols to probabilities OR frequencies. It outputs usable list of tuples from which to do selection.

    def accumulate_normalize_values(p):
            pi = p.items() if isinstance(p,dict) else p
            accum_pi = []
            accum = 0
            for i in pi:
                    accum_pi.append((i[0],i[1]+accum))
                    accum += i[1]
            if accum == 0:
                    raise Exception( "You are about to explode the universe. Continue ? Y/N " )
            normed_a = []
            for a in accum_pi:
                    normed_a.append((a[0],a[1]*1.0/accum))
            return normed_a
    

    Yields:

    >>> accumulate_normalize_values( { 'a': 100, 'b' : 300, 'c' : 400, 'd' : 200  } )
    [('a', 0.1), ('c', 0.5), ('b', 0.8), ('d', 1.0)]
    

    Why it works

    The accumulation step turns each symbol into an interval between itself and the previous symbols probability or frequency (or 0 in the case of the first symbol). These intervals can be used to select from (and thus sample the provided distribution) by simply stepping through the list until the random number in interval 0.0 -> 1.0 (prepared earlier) is less or equal to the current symbol's interval end-point.

    The normalization releases us from the need to make sure everything sums to some value. After normalization the "vector" of probabilities sums to 1.0.

    The rest of the code for selection and generating a arbitrarily long sample from the distribution is below :

    def select(symbol_intervals,random):
            print symbol_intervals,random
            i = 0
            while random > symbol_intervals[i][1]:
                    i += 1
                    if i >= len(symbol_intervals):
                            raise Exception( "What did you DO to that poor list?" )
            return symbol_intervals[i][0]
    
    
    def gen_random(alphabet,length,probabilities=None):
            from random import random
            from itertools import repeat
            if probabilities is None:
                    probabilities = dict(zip(alphabet,repeat(1.0)))
            elif len(probabilities) > 0 and isinstance(probabilities[0],(int,long,float)):
                    probabilities = dict(zip(alphabet,probabilities)) #ordered
            usable_probabilities = accumulate_normalize_values(probabilities)
            gen = []
            while len(gen) < length:
                    gen.append(select(usable_probabilities,random()))
            return gen
    

    Usage :

    >>> gen_random (['a','b','c','d'],10,[100,300,400,200])
    ['d', 'b', 'b', 'a', 'c', 'c', 'b', 'c', 'c', 'c']   #<--- some of the time
    

提交回复
热议问题