Normalization with Stochastic Gradient Descent

后端 未结 0 943
无人共我
无人共我 2020-11-28 17:13

I have a question regarding use of normalization during SGD training. When training is SGD, it means batch size is 1 and this makes batch normalization impossible to calcula

相关标签:
回答
  • 消灭零回复
提交回复
热议问题