Several users have asked about the speed or memory consumption of image convolutions in numpy or scipy [1, 2, 3, 4]. From the responses and my experience using Numpy, I bel
I did some experiments with this too. My guess is that the SciPy convolution does not use the BLAS library to accelerate the computation. Using BLAS, I was able to code a 2D convolution that was comparable in speed to MATLAB's. It's more work, but your best bet is to recode the convolution in C++.
Here is the tight part of the loop (please forgive the weird () based array referencing, it is my convenience class for MATLAB arrays) The key part is that you don't iterate over the image, you iterate over the filter and let BLAS iterate over the image, because typically the image is much larger than the filter.
for(int n = 0; n < filt.numCols; n++)
{
for(int m = 0; m < filt.numRows; m++)
{
const double filt_val = filt(filt.numRows-1-m,filt.numCols-1-n);
for (int i =0; i < diffN; i++)
{
double *out_ptr = &outImage(0,i);
const double *im_ptr = &image(m,i+n);
cblas_daxpy(diffM,filt_val,im_ptr, 1, out_ptr,1);
}
}
}