问题
In image processing and classification networks, a common task is the convolution or cross-correlation of input images with some fixed filters. For example, in convolutional neural nets (CNNs), this is an extremely common operation. I have reduced the general version task to this:
Given: a batch of N images [N,H,W,D,...] and a set of K filters [K,H,W,D,...]
Return: a ndarray that represents the m-dimensional cross-correlation (xcorr) of image N_i with filter K_j for every N_i in N and K_j in K
Currently, I am using scipy.spatial.cdist on a custom function that represents the max of the xcorr of two m-dim images, namely scipy.signal.correlate. The code looks something like this:
from scipy.spatial.distance import cdist
from scipy.signal import correlate
def xcorr(u,v):
'''unfortunately, cdist only takes 2D arrays, so need to do this'''
u = np.reshape(u, [96,96,3])
v = np.reshape(v, [96,96,3])
return np.max(correlate(u,v,mode='same',method='fft'))
batch_images = np.random.random([500,96,96,3])
my_filters = np.random.random([1000,96,96,3])
# unfortunately, cdist only takes 2D arrays, so need to do this
batch_vec = np.reshape(batch_images, [-1,np.prod(batch_images.shape[1:])])
filt_vec = np.reshape(my_filters, [-1,np.prod(my_filters.shape[1:])])
answer = cdist(batch_vec, filt_vec, xcorr)
The method works, and its nice that cdist is automatically parallelized across threads, but it is actually quite slow. I am guessing this is due to a number of reasons, including non-optimal use of the cache between threads (e.g. keep one filter fixed in cache while you filter all the images, or vice versa), the reshape operation inside xcorr, etc.
Does the community have any ideas how to speed this up? I realize in my example xcorr takes the maximum over the cross-correlation between both images, but this was just an example that was fit to work with cdist. Ideally, you could perform this batch operation and use some other function (or none) to get the output you wanted. Ideal solutions could handle (R,G,B,D,...) data.
Any/all help appreciated, including but not limited to wrapping C, although Python/numpy solutions are preferred. I saw some posts related to einsum notation, but I am not super familiar with that, so any help would be appreciated. I welcome tensorflow solutions IF they are able to get the same answer (within reasonable precision) as the corresponding slow numpy version.
来源:https://stackoverflow.com/questions/50239641/multi-dimensional-batch-image-convolution-using-numpy