问题
Given a NumPy array A, what is the fastest/most efficient way to apply the same function, f, to every cell?
Suppose that we will assign to A(i,j) the f(A(i,j)).
The function, f, doesn't have a binary output, thus the mask(ing) operations won't help.
Is the "obvious" double loop iteration (through every cell) the optimal solution?
回答1:
You could just vectorize the function and then apply it directly to a Numpy array each time you need it:
import numpy as np
def f(x):
return x * x + 3 * x - 2 if x > 0 else x * 5 + 8
f = np.vectorize(f) # or use a different name if you want to keep the original f
result_array = f(A) # if A is your Numpy array
It's probably better to specify an explicit output type directly when vectorizing:
f = np.vectorize(f, otypes=[np.float])
回答2:
A similar question is: Mapping a NumPy array in place. If you can find a ufunc for your f(), then you should use the out parameter.
回答3:
If you are working with numbers and f(A(i,j)) = f(A(j,i)), you could use scipy.spatial.distance.cdist defining f as a distance between A(i) and A(j).
回答4:
I believe I have found a better solution. The idea to change the function to python universal function (see documentation), which can exercise parallel computation under the hood.
One can write his own customised ufunc in C, which surely is more efficient, or by invoking np.frompyfunc, which is built-in factory method. After testing, this is more efficient than np.vectorize:
f = lambda x, y: x * y
f_arr = np.frompyfunc(f, 2, 1)
vf = np.vectorize(f)
arr = np.linspace(0, 1, 10000)
%timeit f_arr(arr, arr) # 307ms
%timeit f_arr(arr, arr) # 450ms
I have also tested larger samples, and the improvement is proportional. For comparison of performances of other methods, see this post
来源:https://stackoverflow.com/questions/7701429/efficient-evaluation-of-a-function-at-every-cell-of-a-numpy-array