Scipy ndimage.convolve skips the summation of channels

问题

I'm trying to use scipy's ndimage.convolve function to perform a convolution on a 3 dimensional image (RGB, width, height).

Taking a look here:

It is clear to see that for any input, each kernel/filter should only ever have an output of NxN, with strictly a depth of 1.

This is a problem with scipy, as when you do ndimage.convolve with an input of size (3, 5, 5) and a filter/kernel of size (3, 3, 3), the result of this operation produces an output size of (3, 5, 5), clearly not summing the different channels.

Is there a way to force this summation without manually doing so? I try to do as little in base python as possible, as a lot of external libraries are written in c++ and do the same operations faster. Or is there an alternative?

回答1:

No scipy doesn't skip the summation of channels. The reason why you get a (3, 5, 5) output is because ndimage.convolve is padding the input array along all the axes and then performs convolution in the "same" mode (i.e. the output has the same shape as input, centered with respect to the output of the "full" mode correlation). See the scipy.signal.convolve for more detail on modes.

For your input of shape (3 ,5, 5) and filter w0 of shape (3, 3, 3), the input is padded resulting in a (7, 9, 9) array. See below (for simplicity I use constant padding with 0's):

a = np.array([[[2, 0, 2, 2, 2],
               [1, 1, 0, 2, 0],
               [0, 0, 1, 2, 2],
               [2, 2, 2, 0, 0],
               [1, 0, 1, 2, 0]],

              [[1, 2, 1, 0, 1],
               [0, 2, 0, 0, 1],
               [0, 0, 2, 2, 1],
               [2, 0, 1, 0, 2],
               [0, 1, 2, 2, 2]],

              [[0, 0, 2, 2, 2],
               [0, 1, 2, 1, 0],
               [0, 0, 0, 2, 0],
               [0, 2, 0, 0, 2],
               [0, 0, 2, 2, 1]]])

w0 = np.array([[[0,  1, -1],
                [1, -1,  0],
                [0,  0,  0]],

               [[1,  0,  0],
                [0, -1,  1],
                [1,  0,  1]],

               [[ 1, -1,  0],
                [-1,  0, -1],
                [-1,  0,  1]]])

k = w0.shape[0]

a_p = np.pad(a, k-1)

array([[[0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 2, 0, 2, 2, 2, 0, 0],
        [0, 0, 1, 1, 0, 2, 0, 0, 0],
        [0, 0, 0, 0, 1, 2, 2, 0, 0],
        [0, 0, 2, 2, 2, 0, 0, 0, 0],
        [0, 0, 1, 0, 1, 2, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 2, 1, 0, 1, 0, 0],
        [0, 0, 0, 2, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 2, 2, 1, 0, 0],
        [0, 0, 2, 0, 1, 0, 2, 0, 0],
        [0, 0, 0, 1, 2, 2, 2, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 2, 2, 2, 0, 0],
        [0, 0, 0, 1, 2, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 2, 0, 0, 0],
        [0, 0, 0, 2, 0, 0, 2, 0, 0],
        [0, 0, 0, 0, 2, 2, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0]]])

Before proceeding, note that in the image from cs231n what is performed is correlation and not convolution, so we need to flip the w0 or instead use correlation function (I will do the former).

Then, the convolution is performed by sliding along the first dimension (axis-0), i.e. the (flipped) w0 is convolved with a_p[0:3], then with a_p[1:4], then with a_p[2:5], then with a_p[3:6] and finally with a_p[4:7], each resulting in a (1, 7, 7) array due to summation over the channels. Then they are stacked together resulting in (5, 7, 7) array. To show this I use scipy.signal.convolve which allows to use the full mode:

out = scipy.signal.convolve(a, np.flip(w0), mode='full')

array([[[ 2,  0,  0,  2,  0, -2, -2],
        [-1,  1, -5, -1, -4, -4, -2],
        [-1, -3,  2, -3,  1, -4,  0],
        [ 2,  1, -1, -3, -7,  0, -2],
        [-1, -2, -4, -1, -4, -2,  2],
        [-1, -2, -2, -2,  1, -2,  0],
        [ 0, -1,  1, -1, -1,  2,  0]],

       [[ 3,  2,  4,  0,  4,  2,  1],
        [ 2, -1,  1, -1, -1,  0, -2],
        [ 1, -3,  3,  5,  2,  1,  3],
        [ 4,  2,  1,  4,  0, -3, -2],
        [ 1,  1,  1, -1, -1,  3, -1],
        [ 1, -4,  3, -1, -3, -4,  0],
        [ 0,  0,  0, -1,  1,  2,  2]],

       [[ 1,  2,  4,  4,  2, -2, -1],
        [ 1,  2,  1, -3, -4, -4,  1],
        [-2,  2, -3,  3,  1,  2,  4],
        [ 1,  2,  5, -6,  6, -2,  3],
        [ 2, -5,  4,  1,  5,  4,  0],
        [-2,  0,  0,  1, -3, -4,  3],
        [-1,  1, -1, -2,  4,  3,  3]],

       [[ 0,  0,  2,  2,  4,  2,  2],
        [ 0,  0,  3,  3,  3, -2,  1],
        [-1,  0,  0,  4,  0,  4,  3],
        [ 0,  0,  2,  3,  1,  3,  3],
        [ 0,  0,  0,  1,  7,  1,  3],
        [-2,  2,  0,  2, -3,  1,  4],
        [ 0, -1, -1,  0,  2,  4,  1]],

       [[ 0,  0,  0,  0,  0,  0,  0],
        [ 0,  0,  0, -2,  0,  0,  2],
        [ 0,  0, -3, -1,  1,  3,  0],
        [ 0, -1, -1,  1, -1,  2,  0],
        [ 0,  0, -2,  0,  2, -2,  2],
        [ 0, -2,  2, -2, -2,  3,  1],
        [ 0,  0, -2,  0,  1,  1,  0]]])

To get into the "same" mode of ndimage.convolve we need to center the out:

out = out[1:-1, 1:-1, 1:-1]

array([[[-1,  1, -1, -1,  0],
        [-3,  3,  5,  2,  1],
        [ 2,  1,  4,  0, -3],
        [ 1,  1, -1, -1,  3],
        [-4,  3, -1, -3, -4]],

       [[ 2,  1, -3, -4, -4],
        [ 2, -3,  3,  1,  2],
        [ 2,  5, -6,  6, -2],
        [-5,  4,  1,  5,  4],
        [ 0,  0,  1, -3, -4]],

       [[ 0,  3,  3,  3, -2],
        [ 0,  0,  4,  0,  4],
        [ 0,  2,  3,  1,  3],
        [ 0,  0,  1,  7,  1],
        [ 2,  0,  2, -3,  1]]])

This is exactly what you get if you run scipy.ndimage.convolve(a, np.flip(w0), mode='constant', cval=0). Finally, to get the desired output we need to ignore the elements that relied on padding along the first dimension (i.e. keep only the middle part of the out), also use strides s=2 (i.e. out[1][::s, ::s]), and finally add the bias b = 1:

out[1][::s, ::s] + b

array([[ 3, -2, -3],
       [ 3, -5, -1],
       [ 1,  2, -3]])

Putting everything in one line:

scipy.ndimage.convolve(a, np.flip(w0), mode='constant', cval=0)[1][::2, ::2] + b

# or using scipy.signal.convolve
# scipy.signal.convolve(a, np.flip(w0), 'full')[2][1:-1,1:-1][::2, ::2] + b
# or
# scipy.signal.convolve(a, np.flip(w0), 'same')[1][::2, ::2] + b

来源：https://stackoverflow.com/questions/59782158/scipy-ndimage-convolve-skips-the-summation-of-channels

标签

python

numpy

scipy

conv-neural-network

convolution