Python: rewrite a looping numpy math function to run on GPU

前端未结

关注

 5  449

既然无缘 2021-01-30 23:31

Can someone help me rewrite this one function (the doTheMath function) to do the calculations on the GPU? I used a few good days now trying to get my head

5条回答

误落风尘 (楼主)

2021-01-30 23:39
Tweak #1

Its usually advised to vectorize things when working with NumPy arrays. But with very large arrays, I think you are out of options there. So, to boost performance, a minor tweak is possible to optimize on the last step of summing.

We could replace the step that makes an array of 1s and 0s and does summing :
```
np.where(((abcd <= data2a) & (abcd >= data2b)), 1, 0).sum()
```
with np.count_nonzero that works efficiently to count True values in a boolean array, instead of converting to 1s and 0s -
```
np.count_nonzero((abcd <= data2a) & (abcd >= data2b))
```
Runtime test -
```
In [45]: abcd = np.random.randint(11,99,(10000))

In [46]: data2a = np.random.randint(11,99,(10000))

In [47]: data2b = np.random.randint(11,99,(10000))

In [48]: %timeit np.where(((abcd <= data2a) & (abcd >= data2b)), 1, 0).sum()
10000 loops, best of 3: 81.8 µs per loop

In [49]: %timeit np.count_nonzero((abcd <= data2a) & (abcd >= data2b))
10000 loops, best of 3: 28.8 µs per loop
```
Tweak #2

Use a pre-computed reciprocal when dealing with cases that undergo implicit broadcasting. Some more info here. Thus, store reciprocal of dif and use that instead at the step :
```
((((A  - Cmin) / dif) + ((B  - Cmin) / dif) + ...
```
Sample test -
```
In [52]: A = np.random.rand(10000)

In [53]: dif = 0.5

In [54]: %timeit A/dif
10000 loops, best of 3: 25.8 µs per loop

In [55]: %timeit A*(1.0/dif)
100000 loops, best of 3: 7.94 µs per loop
```
You have four places using division by dif. So, hopefully this would bring out noticeable boost there too!
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...