built-in range or numpy.arange: which is more efficient?

前端 未结 2 561
别跟我提以往
别跟我提以往 2020-12-07 10:41

When iterating over a large array with a range expression, should I use Python\'s built-in range function, or numpy\'s arange to get the best performance?

2条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-07 10:50

    For large arrays numpy should be the faster solution.

    In numpy you should use combinations of vectorized calculations, ufuncs and indexing to solve your problems as it runs at C speed. Looping over numpy arrays is inefficient compared to this.

    (Something like the worst thing you could do would be to iterate over the array with an index created with range or np.arange as the first sentence in your question suggests, but I'm not sure if you really mean that.)

    import numpy as np
    import sys
    
    sys.version
    # out: '2.7.3rc2 (default, Mar 22 2012, 04:35:15) \n[GCC 4.6.3]'
    np.version.version
    # out: '1.6.2'
    
    size = int(1E6)
    
    %timeit for x in range(size): x ** 2
    # out: 10 loops, best of 3: 136 ms per loop
    
    %timeit for x in xrange(size): x ** 2
    # out: 10 loops, best of 3: 88.9 ms per loop
    
    # avoid this
    %timeit for x in np.arange(size): x ** 2
    #out: 1 loops, best of 3: 1.16 s per loop
    
    # use this
    %timeit np.arange(size) ** 2
    #out: 100 loops, best of 3: 19.5 ms per loop
    

    So for this case numpy is 4 times faster than using xrange if you do it right. Depending on your problem numpy can be much faster than a 4 or 5 times speed up.

    The answers to this question explain some more advantages of using numpy arrays instead of python lists for large data sets.

提交回复
热议问题