2-D convolution as a matrix-matrix multiplication

旧巷老猫 提交于 2019-11-27 17:37:41
Salvador Dali

Yes, it is possible and you should also use a doubly block circulant matrix (which is a special case of Toeplitz matrix). I will give you an example with a small size of kernel and the input, but it is possible to construct Toeplitz matrix for any kernel. So you have a 2d input x and 2d kernel k and you want to calculate the convolution x * k. Also let's assume that k is already flipped. Let's also assume that x is of size n×n and k is m×m.

So you unroll k into a sparse matrix of size (n-m+1)^2 × n^2, and unroll x into a long vector n^2 × 1. You compute a multiplication of this sparse matrix with a vector and convert the resulting vector (which will have a size (n-m+1)^2 × 1) into a n-m+1 square matrix.

I am pretty sure this is hard to understand just from reading. So here is an example for 2×2 kernel and 3×3 input.

*

Here is a constructed matrix with a vector:

which is equal to

.

And this is the same result you would have got by doing a sliding window of k over x.

1- Define Input and Filter

Let I be the input signal and F be the filter or kernel.

2- Calculate the final output size

If the I is m1 x n1 and F is m2 x n2 the size of the output will be:

3- Zero-pad the filter matrix

Zero pad the filter to make it the same size as the output.

4- Create Toeplitz matrix for each row of the zero-padded filter

5- Create a doubly blocked Toeplitz matrix

Now all these small Toeplitz matrices should be arranged in a big doubly blocked Toeplitz matrix.

6- Convert the input matrix to a column vector

7- Multiply doubly blocked toeplitz matrix with vectorized input signal

This multiplication gives the convolution result.

8- Last step: reshape the result to a matrix form

For more details and python code take a look at my github repository:

Step by step explanation of 2D convolution implemented as matrix multiplication using toeplitz matrices in python

If you unravel k to a m^2 vector and unroll X, you would then get:

  • a m**2 vectork
  • a ((n-m)**2, m**2) matrix for unrolled_X

where unrolled_X could be obtained by the following Python code:

from numpy import zeros


def unroll_matrix(X, m):
  flat_X = X.flatten()
  n = X.shape[0]
  unrolled_X = zeros(((n - m) ** 2, m**2))
  skipped = 0
  for i in range(n ** 2):
      if (i % n) < n - m and ((i / n) % n) < n - m:
          for j in range(m):
              for l in range(m):
                  unrolled_X[i - skipped, j * m + l] = flat_X[i + j * n + l]
      else:
          skipped += 1
  return unrolled_X

Unrolling X and not k allows a more compact representation (smaller matrices) than the other way around for each X - but you need to unroll each X. You could prefer unrolling k depending on what you want to do.

Here, the unrolled_X is not sparse, whereas unrolled_k would be sparse, but of size ((n-m+1)^2,n^2) as @Salvador Dali mentioned.

Unrolling k could be done like this:

from scipy.sparse import lil_matrix
from numpy import zeros
import scipy 


def unroll_kernel(kernel, n, sparse=True):

    m = kernel.shape[0]
    if sparse:
         unrolled_K = lil_matrix(((n - m)**2, n**2))
    else:
         unrolled_K = zeros(((n - m)**2, n**2))

    skipped = 0
    for i in range(n ** 2):
         if (i % n) < n - m and((i / n) % n) < n - m:
             for j in range(m):
                 for l in range(m):
                    unrolled_K[i - skipped, i + j * n + l] = kernel[j, l]
         else:
             skipped += 1
    return unrolled_K

The code shown above doesn't produce the unrolled matrix of the right dimensions. The dimension should be (n-k+1)*(m-k+1), (k)(k). k: filter dimension, n: num rows in input matrix, m: num columns.

def unfold_matrix(X, k):
    n, m = X.shape[0:2]
    xx = zeros(((n - k + 1) * (m - k + 1), k**2))
    row_num = 0
    def make_row(x):
        return x.flatten()

    for i in range(n- k+ 1):
        for j in range(m - k + 1):
            #collect block of m*m elements and convert to row
            xx[row_num,:] = make_row(X[i:i+k, j:j+k])
            row_num = row_num + 1

    return xx

For more details, see my blog post:

http://www.telesens.co/2018/04/09/initializing-weights-for-the-convolutional-and-fully-connected-layers/

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!