Multi-Dimensional Batch-Image Convolution using Numpy

喜你入骨 提交于 2019-12-10 18:22:55

问题


In image processing and classification networks, a common task is the convolution or cross-correlation of input images with some fixed filters. For example, in convolutional neural nets (CNNs), this is an extremely common operation. I have reduced the general version task to this:

Given: a batch of N images [N,H,W,D,...] and a set of K filters [K,H,W,D,...]

Return: a ndarray that represents the m-dimensional cross-correlation (xcorr) of image N_i with filter K_j for every N_i in N and K_j in K

Currently, I am using scipy.spatial.cdist on a custom function that represents the max of the xcorr of two m-dim images, namely scipy.signal.correlate. The code looks something like this:

from scipy.spatial.distance import cdist
from scipy.signal import correlate

def xcorr(u,v):
    '''unfortunately, cdist only takes 2D arrays, so need to do this'''
    u = np.reshape(u, [96,96,3])
    v = np.reshape(v, [96,96,3])
    return np.max(correlate(u,v,mode='same',method='fft'))

batch_images = np.random.random([500,96,96,3])
my_filters = np.random.random([1000,96,96,3])

# unfortunately, cdist only takes 2D arrays, so need to do this
batch_vec = np.reshape(batch_images, [-1,np.prod(batch_images.shape[1:])])
filt_vec = np.reshape(my_filters, [-1,np.prod(my_filters.shape[1:])])

answer = cdist(batch_vec, filt_vec, xcorr)

The method works, and its nice that cdist is automatically parallelized across threads, but it is actually quite slow. I am guessing this is due to a number of reasons, including non-optimal use of the cache between threads (e.g. keep one filter fixed in cache while you filter all the images, or vice versa), the reshape operation inside xcorr, etc.

Does the community have any ideas how to speed this up? I realize in my example xcorr takes the maximum over the cross-correlation between both images, but this was just an example that was fit to work with cdist. Ideally, you could perform this batch operation and use some other function (or none) to get the output you wanted. Ideal solutions could handle (R,G,B,D,...) data.

Any/all help appreciated, including but not limited to wrapping C, although Python/numpy solutions are preferred. I saw some posts related to einsum notation, but I am not super familiar with that, so any help would be appreciated. I welcome tensorflow solutions IF they are able to get the same answer (within reasonable precision) as the corresponding slow numpy version.

来源:https://stackoverflow.com/questions/50239641/multi-dimensional-batch-image-convolution-using-numpy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!