Computing derivative wrt to the input of a network with batchnormalization : training vs inference time

前端 未结 0 1608
谎友^
谎友^ 2020-12-14 17:03

I am noticing a different behavior when I try to compute the derivative of a network output with respect to its input when this network has a Batch Normalization layer.

相关标签:
回答
  • 消灭零回复
提交回复
热议问题