Fastest Way to generate 1,000,000+ random numbers in python

后端 未结 6 707
时光取名叫无心
时光取名叫无心 2020-12-13 19:28

I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the n

6条回答
  •  眼角桃花
    2020-12-13 20:20

    You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)...

    Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too.

    In [53]: def rand_row_doubles(row_limits, num):
       ....:     ncols = len(row_limits)
       ....:     x = np.random.random((num, ncols))
       ....:     x *= row_limits                  
       ....:     return x                          
       ....:                                       
    In [59]: %timeit rand_row_doubles(np.arange(7) + 1, 1000000)
    10 loops, best of 3: 187 ms per loop
    

    As compared to:

    In [66]: %timeit ManyRandDoubles(np.arange(7) + 1, 1000000)
    1 loops, best of 3: 222 ms per loop
    

    It's not a huge difference, but if you're really worried about speed, it's something.

    Just to show that it's correct:

    In [68]: x.max(0)
    Out[68]:
    array([ 0.99999991,  1.99999971,  2.99999737,  3.99999569,  4.99999836,
            5.99999114,  6.99999738])
    
    In [69]: x.min(0)
    Out[69]:
    array([  4.02099599e-07,   4.41729377e-07,   4.33480302e-08,
             7.43497138e-06,   1.28446819e-05,   4.27614385e-07,
             1.34106753e-05])
    

    Likewise, for your "rows sum to one" part...

    In [70]: def rand_rows_sum_to_one(nrows, ncols):
       ....:     x = np.random.random((ncols, nrows))
       ....:     y = x.sum(axis=0)
       ....:     x /= y
       ....:     return x.T
       ....:
    
    In [71]: %timeit rand_rows_sum_to_one(1000000, 13)
    1 loops, best of 3: 455 ms per loop
    
    In [72]: x = rand_rows_sum_to_one(1000000, 13)
    
    In [73]: x.sum(axis=1)
    Out[73]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])
    

    Honestly, even if you re-implement things in C, I'm not sure you'll be able to beat numpy by much on this one... I could be very wrong, though!

提交回复
热议问题