built-in range or numpy.arange: which is more efficient?

泄露秘密 提交于 2019-11-28 16:11:11
bmu

For large arrays numpy should be the faster solution.

In numpy you should use combinations of vectorized calculations, ufuncs and indexing to solve your problems as it runs at C speed. Looping over numpy arrays is inefficient compared to this.

(Something like the worst thing you could do would be to iterate over the array with an index created with range or np.arange as the first sentence in your question suggests, but I'm not sure if you really mean that.)

import numpy as np
import sys

sys.version
# out: '2.7.3rc2 (default, Mar 22 2012, 04:35:15) \n[GCC 4.6.3]'
np.version.version
# out: '1.6.2'

size = int(1E6)

%timeit for x in range(size): x ** 2
# out: 10 loops, best of 3: 136 ms per loop

%timeit for x in xrange(size): x ** 2
# out: 10 loops, best of 3: 88.9 ms per loop

# avoid this
%timeit for x in np.arange(size): x ** 2
#out: 1 loops, best of 3: 1.16 s per loop

# use this
%timeit np.arange(size) ** 2
#out: 100 loops, best of 3: 19.5 ms per loop

So for this case numpy is 4 times faster than using xrange if you do it right. Depending on your problem numpy can be much faster than a 4 or 5 times speed up.

The answers to this question explain some more advantages of using numpy arrays instead of python lists for large data sets.

First of all, as written by @bmu, you should use combinations of vectorized calculations, ufuncs and indexing. There are indeed some cases where explicit looping is required, but those are really rare.

If explicit loop is needed, with python 2.6 and 2.7, you should use xrange (see below). From what you say, in Python 3, range is the same as xrange (returns a generator). So maybe range is as good for you.

Now, you should try it yourself (using timeit: - here the ipython "magic function"):

%timeit for i in range(1000000): pass
[out] 10 loops, best of 3: 63.6 ms per loop

%timeit for i in np.arange(1000000): pass
[out] 10 loops, best of 3: 158 ms per loop

%timeit for i in xrange(1000000): pass
[out] 10 loops, best of 3: 23.4 ms per loop

Again, as mentioned above, most of the time it is possible to use numpy vector/array formula (or ufunc etc...) which run a c speed: much faster. This is what we could call "vector programming". It makes program easier to implement than C (and more readable) but almost as fast in the end.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!