Can someone help me rewrite this one function (the doTheMath
function) to do the calculations on the GPU? I used a few good days now trying to get my head
Tweak #1
Its usually advised to vectorize things when working with NumPy arrays. But with very large arrays, I think you are out of options there. So, to boost performance, a minor tweak is possible to optimize on the last step of summing.
We could replace the step that makes an array of 1s
and 0s
and does summing :
np.where(((abcd <= data2a) & (abcd >= data2b)), 1, 0).sum()
with np.count_nonzero
that works efficiently to count True
values in a boolean array, instead of converting to 1s
and 0s
-
np.count_nonzero((abcd <= data2a) & (abcd >= data2b))
Runtime test -
In [45]: abcd = np.random.randint(11,99,(10000))
In [46]: data2a = np.random.randint(11,99,(10000))
In [47]: data2b = np.random.randint(11,99,(10000))
In [48]: %timeit np.where(((abcd <= data2a) & (abcd >= data2b)), 1, 0).sum()
10000 loops, best of 3: 81.8 µs per loop
In [49]: %timeit np.count_nonzero((abcd <= data2a) & (abcd >= data2b))
10000 loops, best of 3: 28.8 µs per loop
Tweak #2
Use a pre-computed reciprocal when dealing with cases that undergo implicit broadcasting. Some more info here. Thus, store reciprocal of dif
and use that instead at the step :
((((A - Cmin) / dif) + ((B - Cmin) / dif) + ...
Sample test -
In [52]: A = np.random.rand(10000)
In [53]: dif = 0.5
In [54]: %timeit A/dif
10000 loops, best of 3: 25.8 µs per loop
In [55]: %timeit A*(1.0/dif)
100000 loops, best of 3: 7.94 µs per loop
You have four places using division by dif
. So, hopefully this would bring out noticeable boost there too!