Non-deterministic Gradient Computation

问题

I realized that my models end up being different every time I train them, even though I keep the TensorFlow random seed the same.

I verified that:

Initialization is deterministic; the weights are identical before the first update.
Inputs are deterministic. In fact, various forward computations, including the loss, are identical for the very first batch.
The gradients for the first batch are different. Concretely, I'm comparing the outputs of tf.gradients(loss, train_variables). While loss and train_variables have identical values, the gradients are sometimes different for some of the Variables. The differences are quite significant (sometimes the sum-of-absolute-differences for a single variable's gradient is greater than 1).

I conclude that it's the gradient computation that causes the non-determinism. I had a look at this question and the problem persists when running on a CPU with intra_op_parallelism_thread=1 and inter_op_parallelism_thread=1.

How can the backward pass be non-deterministic when the forward pass isn't? How could I debug this further?

回答1:

This answer might seem a little obvious, but do you use some kind of non-deterministic regularization such as dropout? Given that dropout "drops" some connections randomly when training, it may be causing that difference on the gradients.

Edit: Similar questions:

How to get stable results with TensorFlow, setting random seed
Tensorflow not being deterministic, where it should

Edit 2: This seems to be an issue with TensorFlow's implementation. See the following open issues in GitHub:

Problems Getting TensorFlow to behave Deterministically
Non-deterministic behaviour when ran on GPU

来源：https://stackoverflow.com/questions/42412660/non-deterministic-gradient-computation

标签

tensorflow

non-deterministic