Improving Numpy Performance

前端 未结 5 1204
暖寄归人
暖寄归人 2020-12-08 11:04

I\'d like to improve the performance of convolution using python, and was hoping for some insight on how to best go about improving performance.

I am currently usin

5条回答
  •  离开以前
    2020-12-08 11:55

    Before going to say C with ctypes, I'd suggest running a standalone convolve in C, to see where the limit is.
    Similarly for CUDA, cython, scipy.weave ...

    Added 7feb: convolve33 8-bit data with clipping takes ~ 20 clock cycles per point, 2 clock cycles per mem access, on my mac g4 pcc with gcc 4.2. Your mileage will vary.

    A couple of subtleties:

    • do you care about correct clipping to 0..255 ? np.clip() is slow, cython etc. don't know.
    • Numpy/scipy may need memory for temps the size of A (so keep 2*sizeof(A) < cache size).
      If your C code, though, does a running update inplace, that's half the mem but a different algorithm.

    By the way, google theano convolve => "A convolution op that should mimic scipy.signal.convolve2d, but faster! In development"

提交回复
热议问题