How to handle non-determinism when training on a GPU?

后端 未结 2 1056
逝去的感伤
逝去的感伤 2020-12-03 05:09

While tuning the hyperparameters to get my model to perform better, I noticed that the score I get (and hence the model that is created) is different every time I run the co

2条回答
  •  心在旅途
    2020-12-03 05:58

    To get a MNIST network (https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py) to train deterministically on my GPU (1050Ti):

    • Set PYTHONHASHSEED='SOMESEED'. I do it before starting the python kernel.
    • Set seeds for random generators (not sure all are needed for MNIST)
        python_random.seed(42)
        np.random.seed(42)
        tf.set_random_seed(42)
    
    • Make TF select deterministic GPU algorithms Either:
        import tensorflow as tf
        from tfdeterminism import patch
        patch()
    

    Or:

        os.environ['TF_CUDNN_DETERMINISTIC']='1'
        import tensorflow as tf
    

    Note that the resulting loss is repeatable with either method for selecting deterministic algorithms from TF, but the two methods result in different losses. Also, the solution above doesn't make a more complicated model I'm using repeatable.

    Check out https://github.com/NVIDIA/framework-determinism for a more current answer.

    A side note:

    For cuda cuDNN 8.0.1, non deterministic algorithms exist for:

    (from https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html)

    • cudnnConvolutionBackwardFilter when CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 or CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3 is used
    • cudnnConvolutionBackwardData when CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 is used
    • cudnnPoolingBackward when CUDNN_POOLING_MAX is used
    • cudnnSpatialTfSamplerBackward
    • cudnnCTCLoss and cudnnCTCLoss_v8 when CUDNN_CTC_LOSS_ALGO_NON_DETERMINSTIC is used

提交回复
热议问题