Select cells randomly from NumPy array - without replacement

前端 未结 6 981
萌比男神i
萌比男神i 2021-02-19 16:41

I\'m writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacemen

相关标签:
6条回答
  • 2021-02-19 16:53

    Use random.sample to generates ints in 0 .. A.size with no duplicates, then split them to index pairs:

    import random
    import numpy as np
    
    def randint2_nodup( nsample, A ):
        """ uniform int pairs, no dups:
            r = randint2_nodup( nsample, A )
            A[r]
            for jk in zip(*r):
                ... A[jk]
        """
        assert A.ndim == 2
        sample = np.array( random.sample( xrange( A.size ), nsample ))  # nodup ints
        return sample // A.shape[1], sample % A.shape[1]  # pairs
    
    
    if __name__ == "__main__":
        import sys
    
        nsample = 8
        ncol = 5
        exec "\n".join( sys.argv[1:] )  # run this.py N= ...
        A = np.arange( 0, 2*ncol ).reshape((2,ncol))
    
        r = randint2_nodup( nsample, A )
        print "r:", r
        print "A[r]:", A[r]
        for jk in zip(*r):
            print jk, A[jk]
    
    0 讨论(0)
  • 2021-02-19 17:01

    people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice

    0 讨论(0)
  • 2021-02-19 17:08

    Let's say you have an array of data points of size 8x3

    data = np.arange(50,74).reshape(8,-1)
    

    If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:

    #generate a permutation of data's size, coerced to data's shape
    idxs = divmod(np.random.permutation(data.size),data.shape[1])
    
    #iterate over it
    for x,y in zip(*idxs): 
        #do something to data[x,y] here
        pass
    

    Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.

    flat_data = data.ravel()
    flat_idxs = np.random.permutation(flat_data.size)
    for i in flat_idxs:
        #do something to flat_data[i] here
        pass
    

    This will still permute the 2d "original" array as you'd like. To see this, try:

     flat_data[12] = 1000000
     print data[4,0]
     #returns 1000000
    
    0 讨论(0)
  • 2021-02-19 17:11

    Extending the nice answer from @WoLpH

    For a 2D array I think it will depend on what you want or need to know about the indices.

    You could do something like this:

    data = np.arange(25).reshape((5,5))
    
    x, y  = np.where( a = a)
    idx = zip(x,y)
    np.random.shuffle(idx)
    

    OR

    data = np.arange(25).reshape((5,5))
    
    grid = np.indices(data.shape)
    idx = zip( grid[0].ravel(), grid[1].ravel() )
    np.random.shuffle(idx)
    

    You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.

    Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.

    0 讨论(0)
  • 2021-02-19 17:15

    How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?

    If you need to change the array in-place than you can create an index array like this:

    your_array = <some numpy array>
    index_array = numpy.arange(your_array.size)
    numpy.random.shuffle(index_array)
    
    print your_array[index_array[:10]]
    
    0 讨论(0)
  • 2021-02-19 17:18

    All of these answers seemed a little convoluted to me.

    I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.

    The following code will do this in a simple and straight-forward manner:

    #!/usr/bin/python
    import numpy as np
    
    #Define a two-dimensional array
    #Use any number of dimensions, and dimensions of any size
    d=numpy.zeros(30).reshape((5,6))
    
    #Get a list of indices for an array of this shape
    indices=list(np.ndindex(d.shape))
    
    #Shuffle the indices in-place
    np.random.shuffle(indices)
    
    #Access array elements using the indices to do cool stuff
    for i in indices:
      d[i]=5
    
    print d
    

    Printing d verified that all elements have been accessed.

    Note that the array can have any number of dimensions and that the dimensions can be of any size.

    The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.

    0 讨论(0)
提交回复
热议问题