Fastest 2D convolution or image filter in Python

前端 未结 5 1575
遥遥无期
遥遥无期 2020-12-24 03:35

Several users have asked about the speed or memory consumption of image convolutions in numpy or scipy [1, 2, 3, 4]. From the responses and my experience using Numpy, I bel

相关标签:
5条回答
  • 2020-12-24 04:04

    I have been trying to improve the convolution speed in my application and I have been using signal.correlate which happens to be about 20 times slower than signal.correlate2d, my input matrices are smaller(27x27 and 5x5). As of 2018, this is what I observed on my machine(Dell Inspiron 13, Core i5) for the specified matrices in the actual question.

    OpenCV did the best but the caveat with that is that it doesn't given "mode" options. Input and Output are of the same size.

    >>> img= np.random.rand(2048,2048)
    >>> kernel = np.ones((32,32), dtype=np.float)
    >>> t1= time.time();dst1 = cv2.filter2D(img,-1,kernel);print(time.time()-t1)
    0.208490133286
    >>> t1= time.time();dst2 = signal.correlate(img,kernel,mode='valid',method='fft');print(time.time()-t1)
    0.582989931107
    >>> t1= time.time();dst3 = signal.convolve2d(img,kernel,mode='valid');print(time.time()-t1)
    11.2672450542
    >>> t1= time.time();dst4 = signal.correlate2d(img,kernel,mode='valid');print(time.time()-t1)
    11.2443971634
    >>> t1= time.time();dst5 = signal.fftconvolve(img,kernel,mode='valid');print(time.time()-t1)
    0.581533193588
    
    0 讨论(0)
  • 2020-12-24 04:07

    On my machine, a hand-crafted circular convolution using FFTs seems to be fasted:

    import numpy
    x = numpy.random.random((2048, 2048)).astype(numpy.float32)
    y = numpy.random.random((32, 32)).astype(numpy.float32)
    z = numpy.fft.irfft2(numpy.fft.rfft2(x) * numpy.fft.rfft2(y, x.shape))
    

    Note that this might treat the areas close to the edges differently than other ways, because it's a circular convolution.

    0 讨论(0)
  • 2020-12-24 04:14

    Scipy has a function fftconvolve, that can be used for 1D and 2D signals.

    from scipy import signal
    from scipy import misc
    import numpy as np
    import matplotlib.pyplot as plt
    
    face = misc.face(gray=True)
    kernel = np.outer(signal.gaussian(70, 8), signal.gaussian(70, 8))
    blurred = signal.fftconvolve(face, kernel, mode='same')
    
    fig, (ax_orig, ax_kernel, ax_blurred) = plt.subplots(3, 1, figsize=(6, 15))
    ax_orig.imshow(face, cmap='gray')
    ax_orig.set_title('Original')
    ax_orig.set_axis_off()
    ax_kernel.imshow(kernel, cmap='gray')
    ax_kernel.set_title('Gaussian kernel')
    ax_kernel.set_axis_off()
    ax_blurred.imshow(blurred, cmap='gray')
    ax_blurred.set_title('Blurred')
    ax_blurred.set_axis_off()
    fig.show()
    
    0 讨论(0)
  • 2020-12-24 04:16

    I did some experiments with this too. My guess is that the SciPy convolution does not use the BLAS library to accelerate the computation. Using BLAS, I was able to code a 2D convolution that was comparable in speed to MATLAB's. It's more work, but your best bet is to recode the convolution in C++.

    Here is the tight part of the loop (please forgive the weird () based array referencing, it is my convenience class for MATLAB arrays) The key part is that you don't iterate over the image, you iterate over the filter and let BLAS iterate over the image, because typically the image is much larger than the filter.

    for(int n = 0; n < filt.numCols; n++)
      {
        for(int m = 0; m < filt.numRows; m++)
        {
          const double filt_val = filt(filt.numRows-1-m,filt.numCols-1-n);
          for (int i =0; i < diffN; i++)
          {
            double *out_ptr = &outImage(0,i);
            const double *im_ptr = &image(m,i+n);
            cblas_daxpy(diffM,filt_val,im_ptr, 1, out_ptr,1);
    
          }
       }
     }
    
    0 讨论(0)
  • 2020-12-24 04:21

    It really depends on what you want to do... A lot of the time, you don't need a fully generic (read: slower) 2D convolution... (i.e. If the filter is separable, you use two 1D convolutions instead... This is why the various scipy.ndimage.gaussian, scipy.ndimage.uniform, are much faster than the same thing implemented as a generic n-D convolutions.)

    At any rate, as a point of comparison:

    t = timeit.timeit(stmt='ndimage.convolve(x, y, output=x)', number=1,
    setup="""
    import numpy as np
    from scipy import ndimage
    x = np.random.random((2048, 2048)).astype(np.float32)
    y = np.random.random((32, 32)).astype(np.float32)
    """)
    print t
    

    This takes 6.9 sec on my machine...

    Compare this with fftconvolve

    t = timeit.timeit(stmt="signal.fftconvolve(x, y, mode='same')", number=1,
    setup="""
    import numpy as np
    from scipy import signal
    x = np.random.random((2048, 2048)).astype(np.float32)
    y = np.random.random((32, 32)).astype(np.float32)
    """)
    print t
    

    This takes about 10.8 secs. However, with different input sizes, using fft's to do a convolution can be considerably faster (Though I can't seem to come up with a good example, at the moment...).

    0 讨论(0)
提交回复
热议问题