问题
I'm currently experimenting with numba and especially vectorized functions, so I created a sum vectorized function (because it is easy to compare this to np.sum.
import numpy as np
import numba as nb
@nb.vectorize([nb.float64(nb.float64, nb.float64)])
def numba_sum(element1, element2):
return element1 + element2
@nb.vectorize([nb.float64(nb.float64, nb.float64)], target='parallel')
def numba_sum_parallel(element1, element2):
return element1 + element2
array = np.ones(elements)
np.testing.assert_almost_equal(numba_sum.reduce(array), np.sum(array))
np.testing.assert_almost_equal(numba_sum_parallel.reduce(array), np.sum(array))
Depending on the number of elements the parallel code does not return the same number as the cpu targeted code. I think that's because of something related to the usual threading-problems (but why? Is that a Bug in Numba or something that just happens when using parallel execution?). Funny is that sometimes it works, sometimes it does not. Sometimes it fails with elements=1000 sometimes it starts failing on elements=100000.
For example:
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 93238.0
DESIRED: 100000.0
and if I run it again
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 83883.0
DESIRED: 100000.0
My question is now: Why would I ever want a parallel vectorized function? My understanding is that the purpose of a vectorized function is to provide the numpy-ufunc possibilities but I tested reduce and accumulate and they stop working at some (variable) number of elements and who wants an unreliable function?
I'm using numba 0.23.1, numpy 1.10.1 with python 3.5.1.
回答1:
You ask:
where would "parallel" vectorized functions make sense given that it can lead to such problems
Given that ufuncs produced by numba.vectorize(target='parallel') have defective reduce() methods, the question is what can we do with them that is useful?
In your case, the ufunc does addition. A useful application of this with target='parallel' is elementwise addition of two arrays:
numba_sum(array, array)
This is indeed faster than a single-core solution, and seems not to be impacted by the bugs that cripple reduce() and friends.
来源:https://stackoverflow.com/questions/35459065/numbas-parallel-vectorized-functions