The keras BatchNormalization layer uses axis=-1
as a default value and states that the feature axis is typically normalized. Why is this the case?
I sup
if your mini-batch is a matrix A mxn, i.e. m samples and n features, the normalization axis should be axis=0. As your said, what we want is to normalize every feature individually, the default axis = -1 in keras because when it is used in the convolution-layer, the dimensions of figures dataset are usually (samples, width, height, channal), and the batch samples are normalized long the channal axis(the last axis).
The confusion is due to the meaning of axis
in np.mean
versus in BatchNormalization
.
When we take the mean along an axis, we collapse that dimension and preserve all other dimensions. In your example data.mean(axis=0)
collapses the 0-axis
, which is the vertical dimension of data
.
When we compute a BatchNormalization
along an axis, we preserve the dimensions of the array, and we normalize with respect to the mean and standard deviation over every other axis. So in your 2D
example BatchNormalization
with axis=1
is subtracting the mean for axis=0
, just as you expect. This is why bn.moving_mean
has shape (4,)
.
I know this post is old, but am still answering it because the confusion still lingers on in Keras documentation. I had to go through the code to figure this out: