Python: rewrite a looping numpy math function to run on GPU

前端 未结 5 449
既然无缘
既然无缘 2021-01-30 23:31

Can someone help me rewrite this one function (the doTheMath function) to do the calculations on the GPU? I used a few good days now trying to get my head

5条回答
  •  误落风尘
    2021-01-30 23:39

    Tweak #1

    Its usually advised to vectorize things when working with NumPy arrays. But with very large arrays, I think you are out of options there. So, to boost performance, a minor tweak is possible to optimize on the last step of summing.

    We could replace the step that makes an array of 1s and 0s and does summing :

    np.where(((abcd <= data2a) & (abcd >= data2b)), 1, 0).sum()
    

    with np.count_nonzero that works efficiently to count True values in a boolean array, instead of converting to 1s and 0s -

    np.count_nonzero((abcd <= data2a) & (abcd >= data2b))
    

    Runtime test -

    In [45]: abcd = np.random.randint(11,99,(10000))
    
    In [46]: data2a = np.random.randint(11,99,(10000))
    
    In [47]: data2b = np.random.randint(11,99,(10000))
    
    In [48]: %timeit np.where(((abcd <= data2a) & (abcd >= data2b)), 1, 0).sum()
    10000 loops, best of 3: 81.8 µs per loop
    
    In [49]: %timeit np.count_nonzero((abcd <= data2a) & (abcd >= data2b))
    10000 loops, best of 3: 28.8 µs per loop
    

    Tweak #2

    Use a pre-computed reciprocal when dealing with cases that undergo implicit broadcasting. Some more info here. Thus, store reciprocal of dif and use that instead at the step :

    ((((A  - Cmin) / dif) + ((B  - Cmin) / dif) + ...
    

    Sample test -

    In [52]: A = np.random.rand(10000)
    
    In [53]: dif = 0.5
    
    In [54]: %timeit A/dif
    10000 loops, best of 3: 25.8 µs per loop
    
    In [55]: %timeit A*(1.0/dif)
    100000 loops, best of 3: 7.94 µs per loop
    

    You have four places using division by dif. So, hopefully this would bring out noticeable boost there too!

提交回复
热议问题