Non-deterministic Gradient Computation

做~自己de王妃 提交于 2019-12-10 13:47:02

问题


I realized that my models end up being different every time I train them, even though I keep the TensorFlow random seed the same.

I verified that:

  • Initialization is deterministic; the weights are identical before the first update.
  • Inputs are deterministic. In fact, various forward computations, including the loss, are identical for the very first batch.
  • The gradients for the first batch are different. Concretely, I'm comparing the outputs of tf.gradients(loss, train_variables). While loss and train_variables have identical values, the gradients are sometimes different for some of the Variables. The differences are quite significant (sometimes the sum-of-absolute-differences for a single variable's gradient is greater than 1).

I conclude that it's the gradient computation that causes the non-determinism. I had a look at this question and the problem persists when running on a CPU with intra_op_parallelism_thread=1 and inter_op_parallelism_thread=1.

How can the backward pass be non-deterministic when the forward pass isn't? How could I debug this further?


回答1:


This answer might seem a little obvious, but do you use some kind of non-deterministic regularization such as dropout? Given that dropout "drops" some connections randomly when training, it may be causing that difference on the gradients.

Edit: Similar questions:

  • How to get stable results with TensorFlow, setting random seed
  • Tensorflow not being deterministic, where it should

Edit 2: This seems to be an issue with TensorFlow's implementation. See the following open issues in GitHub:

  • Problems Getting TensorFlow to behave Deterministically
  • Non-deterministic behaviour when ran on GPU


来源:https://stackoverflow.com/questions/42412660/non-deterministic-gradient-computation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!