The keras BatchNormalization layer uses axis=-1 as a default value and states that the feature axis is typically normalized. Why is this the case?
I sup
if your mini-batch is a matrix A mxn, i.e. m samples and n features, the normalization axis should be axis=0. As your said, what we want is to normalize every feature individually, the default axis = -1 in keras because when it is used in the convolution-layer, the dimensions of figures dataset are usually (samples, width, height, channal), and the batch samples are normalized long the channal axis(the last axis).