Determinism in tensorflow gradient updates?

前端 未结 3 1559
一向
一向 2020-12-17 01:44

So I have a very simple NN script written in Tensorflow, and I am having a hard time trying to trace down where some \"randomness\" is coming in from.

I have record

相关标签:
3条回答
  • 2020-12-17 01:57

    There's a good chance you could get deterministic results if you run your network on CPU (export CUDA_VISIBLE_DEVICES=), with single-thread in Eigen thread pool (tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=1)), one Python thread (no multi-threaded queue-runners that you get from ops like tf.batch), and a single well-defined operation order. Also using inter_op_parallelism_threads=1 may help in some scenarios.

    One issue is that floating point addition/multiplication is non-associative, so one fool-proof way to get deterministic results is to use integer arithmetic or quantized values.

    Barring that, you could isolate which operation is non-deterministic, and try to avoid using that op. For instance, there's tf.add_n op, which doesn't say anything about the order in which it sums the values, but different orders produce different results.

    Getting deterministic results is a bit of an uphill battle because determinism is in conflict with performance, and performance is usually the goal that gets more attention. An alternative to trying to have exact same numbers on reruns is to focus on numerical stability -- if your algorithm is stable, then you will get reproducible results (ie, same number of misclassifications) even though exact parameter values may be slightly different

    0 讨论(0)
  • 2020-12-17 02:12

    The tensorflow reduce_sum op is specifically known to be non-deterministic. Furthermore, reduce_sum is used for calculating bias gradients.

    This post discusses a workaround to avoid using reduce_sum (ie taking the dot product of any vector w/ a vector of all 1's is the same as reduce_sum)

    0 讨论(0)
  • 2020-12-17 02:16

    I have faced the same problem.. The working solution for me was to:

    1- use tf.set_random_seed(1) in order to make all tf functions have the same seed every new run

    2- Training the model using CPU not the GPU to avoid GPU non-deterministic operations due to precision.

    0 讨论(0)
提交回复
热议问题