I understand that Batch Normalisation helps in faster training by turning the activation towards unit Gaussian distribution and thus tackling vanishing gradients problem. Batch
IN provide visual and appearance in-variance and BN accelerate training and preserve discriminative feature. IN is preferred in Shallow layer(starting layer of CNN) so remove appearance variation and BN is preferred in deep layers(last CNN layer) should be reduce in order to maintain discrimination.