Determinism in tensorflow gradient updates?

前端未结

关注

 3  1565

So I have a very simple NN script written in Tensorflow, and I am having a hard time trying to trace down where some \"randomness\" is coming in from.

I have record

相关标签:

3条回答

梦毁少年i

2020-12-17 01:57

There's a good chance you could get deterministic results if you run your network on CPU (export CUDA_VISIBLE_DEVICES=), with single-thread in Eigen thread pool (tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=1)), one Python thread (no multi-threaded queue-runners that you get from ops like tf.batch), and a single well-defined operation order. Also using inter_op_parallelism_threads=1 may help in some scenarios.

One issue is that floating point addition/multiplication is non-associative, so one fool-proof way to get deterministic results is to use integer arithmetic or quantized values.

Barring that, you could isolate which operation is non-deterministic, and try to avoid using that op. For instance, there's tf.add_n op, which doesn't say anything about the order in which it sums the values, but different orders produce different results.

Getting deterministic results is a bit of an uphill battle because determinism is in conflict with performance, and performance is usually the goal that gets more attention. An alternative to trying to have exact same numbers on reruns is to focus on numerical stability -- if your algorithm is stable, then you will get reproducible results (ie, same number of misclassifications) even though exact parameter values may be slightly different

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-12-17 02:12

The tensorflow reduce_sum op is specifically known to be non-deterministic. Furthermore, reduce_sum is used for calculating bias gradients.

This post discusses a workaround to avoid using reduce_sum (ie taking the dot product of any vector w/ a vector of all 1's is the same as reduce_sum)

0 讨论(0)
发布评论:

提交评论
- 加载中...
攒了一身酷

2020-12-17 02:16

I have faced the same problem.. The working solution for me was to:

1- use tf.set_random_seed(1) in order to make all tf functions have the same seed every new run

2- Training the model using CPU not the GPU to avoid GPU non-deterministic operations due to precision.

0 讨论(0)
发布评论:

提交评论
- 加载中...