I have a question regarding use of normalization during SGD training. When training is SGD, it means batch size is 1 and this makes batch normalization impossible to calcula