How to handle non-determinism when training on a GPU?

后端未结

关注

 2  1056

逝去的感伤 2020-12-03 05:09

While tuning the hyperparameters to get my model to perform better, I noticed that the score I get (and hence the model that is created) is different every time I run the co

2条回答

心在旅途 (楼主)

2020-12-03 05:58
To get a MNIST network (https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py) to train deterministically on my GPU (1050Ti):
- Set PYTHONHASHSEED='SOMESEED'. I do it before starting the python kernel.
- Set seeds for random generators (not sure all are needed for MNIST)
```
    python_random.seed(42)
    np.random.seed(42)
    tf.set_random_seed(42)
```
- Make TF select deterministic GPU algorithms Either:
```
    import tensorflow as tf
    from tfdeterminism import patch
    patch()
```
Or:
```
    os.environ['TF_CUDNN_DETERMINISTIC']='1'
    import tensorflow as tf
```
Note that the resulting loss is repeatable with either method for selecting deterministic algorithms from TF, but the two methods result in different losses. Also, the solution above doesn't make a more complicated model I'm using repeatable.

Check out https://github.com/NVIDIA/framework-determinism for a more current answer.

A side note:

For cuda cuDNN 8.0.1, non deterministic algorithms exist for:

(from https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html)
- cudnnConvolutionBackwardFilter when CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 or CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3 is used
- cudnnConvolutionBackwardData when CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 is used
- cudnnPoolingBackward when CUDNN_POOLING_MAX is used
- cudnnSpatialTfSamplerBackward
- cudnnCTCLoss and cudnnCTCLoss_v8 when CUDNN_CTC_LOSS_ALGO_NON_DETERMINSTIC is used
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...