Efficiently generating multiple instances of numpy.random.choice without replacement

后端 未结 4 1715
一个人的身影
一个人的身影 2020-12-10 18:48

I\'m new to Python. While reading, please mention any other suggestions regarding ways to improve my Python code.

Question: How do I generate a 8x

相关标签:
4条回答
  • 2020-12-10 18:52

    How about shuffling, that is to say, permuting?

    import random
    import numpy
    from timeit import Timer 
    
    def B_rand_M(N):
        a = numpy.arange(1,9)
        M = numpy.zeros(shape = (8, N))
        for i in range (0, N):
            M[:, i] = numpy.random.permutation(a)
        return M
    
    # your original implementation
    def J_rand_M(N):
        M = numpy.zeros(shape = (8, N))
        for i in range (0, N):
            M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 
        return M
    

    some timings:

    def compare(N):
        for f in (J_rand_M, B_rand_M):
            t = Timer(lambda: f(N)).timeit(6)
            print 'time for %s(%s): %.6f' % (f.__name__, N, t)
    
    for i in range(6):
        print 'N = 10^%s' % i
        compare(10**i)
        print
    

    gives

    N = 10^0
    time for J_rand_M(1): 0.001199
    time for B_rand_M(1): 0.000080
    
    N = 10^1
    time for J_rand_M(10): 0.001112
    time for B_rand_M(10): 0.000335
    
    N = 10^2
    time for J_rand_M(100): 0.011118
    time for B_rand_M(100): 0.003022
    
    N = 10^3
    time for J_rand_M(1000): 0.110887
    time for B_rand_M(1000): 0.030528
    
    N = 10^4
    time for J_rand_M(10000): 1.100540
    time for B_rand_M(10000): 0.304696
    
    N = 10^5
    time for J_rand_M(100000): 11.151576
    time for B_rand_M(100000): 3.049474
    
    0 讨论(0)
  • 2020-12-10 19:02

    Create a random array of specified shape and then sort along the axis where you want to keep the limits, thus giving us a vectorized and very efficient solution. This would be based on this smart answer to MATLAB randomly permuting columns differently. Here's the implementation -

    Sample run -

    In [122]: N = 10
    
    In [123]: np.argsort(np.random.rand(8,N),axis=0)+1
    Out[123]: 
    array([[7, 3, 5, 1, 1, 5, 2, 4, 1, 4],
           [8, 4, 3, 2, 2, 8, 5, 5, 6, 2],
           [1, 2, 4, 6, 5, 4, 4, 3, 4, 7],
           [5, 6, 2, 5, 8, 2, 7, 8, 5, 8],
           [2, 8, 6, 3, 4, 7, 1, 1, 2, 6],
           [6, 7, 7, 8, 6, 6, 3, 2, 7, 3],
           [4, 1, 1, 4, 3, 3, 8, 6, 8, 1],
           [3, 5, 8, 7, 7, 1, 6, 7, 3, 5]], dtype=int64)
    

    Runtime tests -

    In [124]: def sortbased_rand8(N):
         ...:     return np.argsort(np.random.rand(8,N),axis=0)+1
         ...: 
         ...: def rand_M(N):
         ...:     M = np.zeros(shape = (8, N))
         ...:     for i in range (0, N):
         ...:         M[:, i] = np.random.choice(8, size = 8, replace = False) + 1 
         ...:     return M
         ...: 
    
    In [125]: N = 5000
    
    In [126]: %timeit sortbased_rand8(N)
    100 loops, best of 3: 1.95 ms per loop
    
    In [127]: %timeit rand_M(N)
    1 loops, best of 3: 233 ms per loop
    

    Thus, awaits a 120x speedup!

    0 讨论(0)
  • 2020-12-10 19:13

    Use below code for your array generation

    import numpy as np
    N=1e7 # THe value you want to have
    np.random.randint(1,high=8,size=(8,N))
    

    Hope this helps, it will surely not going to take that much time.

    0 讨论(0)
  • 2020-12-10 19:14

    Just a comment on your runtime analysis of the problem - my intuition is that O(n) is the best possible runtime you can possibly obtain when generating O(n) truly random numbers.

    Have you tried actually running your code with n = 10 million? Your assumption that the runtime will scale by 1000 when the input grows by a factor of 1000 may not be true in practice, as there is usually a constant term when executing any program (loading libraries, etc.), which may be significant depending on the problem.

    That being said, it looks like the question linked by Eric Wright does a very thorough job and can easily be adapted to fit your question.

    0 讨论(0)
提交回复
热议问题