I'm currently experimenting with numba and especially vectorized functions, so I created a sum vectorized function (because it is easy to compare this to np.sum.
import numpy as np
import numba as nb
@nb.vectorize([nb.float64(nb.float64, nb.float64)])
def numba_sum(element1, element2):
return element1 + element2
@nb.vectorize([nb.float64(nb.float64, nb.float64)], target='parallel')
def numba_sum_parallel(element1, element2):
return element1 + element2
array = np.ones(elements)
np.testing.assert_almost_equal(numba_sum.reduce(array), np.sum(array))
np.testing.assert_almost_equal(numba_sum_parallel.reduce(array), np.sum(array))
Depending on the number of elements the parallel code does not return the same number as the cpu targeted code. I think that's because of something related to the usual threading-problems (but why? Is that a Bug in Numba or something that just happens when using parallel execution?). Funny is that sometimes it works, sometimes it does not. Sometimes it fails with elements=1000 sometimes it starts failing on elements=100000.
For example:
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 93238.0
DESIRED: 100000.0
and if I run it again
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 83883.0
DESIRED: 100000.0
My question is now: Why would I ever want a parallel vectorized function? My understanding is that the purpose of a vectorized function is to provide the numpy-ufunc possibilities but I tested reduce and accumulate and they stop working at some (variable) number of elements and who wants an unreliable function?
I'm using numba 0.23.1, numpy 1.10.1 with python 3.5.1.
You ask:
where would "parallel" vectorized functions make sense given that it can lead to such problems
Given that ufuncs produced by numba.vectorize(target='parallel') have defective reduce() methods, the question is what can we do with them that is useful?
In your case, the ufunc does addition. A useful application of this with target='parallel' is elementwise addition of two arrays:
numba_sum(array, array)
This is indeed faster than a single-core solution, and seems not to be impacted by the bugs that cripple reduce() and friends.
来源:https://stackoverflow.com/questions/35459065/numbas-parallel-vectorized-functions