Say we have a single channel image (5x5)
A = [ 1 2 3 4 5 6 7 8 9 2 1 4 5 6 3 4 5 6 7 4 3 4 5 6 2 ]
And a filter K (
For RGB-like inputs, the filter is actually 2*2*3, each filter corresponse to one color channel, resulting three filter response. These three add up to one flowing by bias and activation. finally, this is one pixel in the output map.