Batch Normalization in Convolutional Neural Network

后端 未结 4 600
攒了一身酷
攒了一身酷 2020-12-07 08:16

I am newbie in convolutional neural networks and just have idea about feature maps and how convolution is done on images to extract features. I would be glad to know some de

4条回答
  •  孤城傲影
    2020-12-07 08:29

    Let's start with the terms. Remember that the output of the convolutional layer is a 4-rank tensor [B, H, W, C], where B is the batch size, (H, W) is the feature map size, C is the number of channels. An index (x, y) where 0 <= x < H and 0 <= y < W is a spatial location.

    Usual batchnorm

    Now, here's how the batchnorm is applied in a usual way (in pseudo-code):

    # t is the incoming tensor of shape [B, H, W, C]
    # mean and stddev are computed along 0 axis and have shape [H, W, C]
    mean = mean(t, axis=0)
    stddev = stddev(t, axis=0)
    for i in 0..B-1:
      out[i,:,:,:] = norm(t[i,:,:,:], mean, stddev)
    

    Basically, it computes H*W*C means and H*W*C standard deviations across B elements. You may notice that different elements at different spatial locations have their own mean and variance and gather only B values.

    Batchnorm in conv layer

    This way is totally possible. But the convolutional layer has a special property: filter weights are shared across the input image (you can read it in detail in this post). That's why it's reasonable to normalize the output in the same way, so that each output value takes the mean and variance of B*H*W values, at different locations.

    Here's how the code looks like in this case (again pseudo-code):

    # t is still the incoming tensor of shape [B, H, W, C]
    # but mean and stddev are computed along (0, 1, 2) axes and have just [C] shape
    mean = mean(t, axis=(0, 1, 2))
    stddev = stddev(t, axis=(0, 1, 2))
    for i in 0..B-1, x in 0..H-1, y in 0..W-1:
      out[i,x,y,:] = norm(t[i,x,y,:], mean, stddev)
    

    In total, there are only C means and standard deviations and each one of them is computed over B*H*W values. That's what they mean when they say "effective mini-batch": the difference between the two is only in axis selection (or equivalently "mini-batch selection").

提交回复
热议问题