Numpy drawing from urn

前端 未结 2 1947
遇见更好的自我
遇见更好的自我 2020-12-19 13:12

I want to run a relatively simple random draw in numpy, but I can\'t find a good way to express it. I think the best way is to describe it as drawing from an urn without rep

2条回答
  •  一生所求
    2020-12-19 13:25

    What you want is an implementation of the multivariate hypergeometric distribution. I don't know of one in numpy or scipy, but it might already exist out there somewhere.

    I contributed an implementation of the multivariate hypergeometric distribution to numpy 1.18.0; see numpy.random.Generator.multivariate_hypergeometric.

    For example, to draw 15 samples from an urn containing 12 red, 4 green and 18 blue marbles, and repeat the process 10 times:

    In [4]: import numpy as np
    
    In [5]: rng = np.random.default_rng()
    
    In [6]: colors = [12, 4, 18]
    
    In [7]: rng.multivariate_hypergeometric(colors, 15, size=10)                    
    Out[7]: 
    array([[ 5,  4,  6],
           [ 3,  3,  9],
           [ 6,  2,  7],
           [ 7,  2,  6],
           [ 3,  0, 12],
           [ 5,  2,  8],
           [ 6,  2,  7],
           [ 7,  1,  7],
           [ 8,  1,  6],
           [ 6,  1,  8]])
    

    The rest of this answer is now obsolete, but I'll leave for posterity (whatever that mean...).


    You can implement it using repeated calls to numpy.random.hypergeometric. Whether that will be more efficient than your implementation depends on how many colors there are and how many balls of each color.

    For example, here's a script that prints the result of drawing from an urn containing three colors (red, green and blue):

    from __future__ import print_function
    
    import numpy as np
    
    
    nred = 12
    ngreen = 4
    nblue = 18
    
    m = 15
    
    red = np.random.hypergeometric(nred, ngreen + nblue, m)
    green = np.random.hypergeometric(ngreen, nblue, m - red)
    blue = m - (red + green)
    
    print("red:   %2i" % red)
    print("green: %2i" % green)
    print("blue:  %2i" % blue)
    

    Sample output:

    red:    6
    green:  1
    blue:   8
    

    The following function generalizes that to choosing m balls given an array colors holding the number of each color:

    def sample(m, colors):
        """
        Parameters
        ----------
        m : number balls to draw from the urn
        colors : one-dimensional array of number balls of each color in the urn
    
        Returns
        -------
        One-dimensional array with the same length as `colors` containing the
        number of balls of each color in a random sample.
        """
    
        remaining = np.cumsum(colors[::-1])[::-1]
        result = np.zeros(len(colors), dtype=np.int)
        for i in range(len(colors)-1):
            if m < 1:
                break
            result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)
            m -= result[i]
        result[-1] = m
        return result
    

    For example,

    >>> sample(10, [2, 4, 8, 16])
    array([2, 3, 1, 4])
    

提交回复
热议问题