Reproducible results using Keras with TensorFlow backend

问题

I am using Keras to build a deep learning LSTM model, using TensorFlow backend. Each time I run the model, the result is different. Is there a way to fix the seed to create reproducible results? Thank you!

回答1:

As @Poete_Maudit said here: How to get reproducible results in keras

To get reproducible results you will have to do the following at the very beginning of your script (that will be forced to use a single CPU):

# Seed value (can actually be different for each attribution step)
seed_value= 0

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.set_random_seed(seed_value)

# 5. Configure a new global `tensorflow` session
from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

Note: You cannot (anymore) get reproducible results using command: PYTHONHASHSEED=0 python3 script.py, as https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development might let you think, and you have to set PYTHONHASHSEED with os.environ within your script as in step #1. Also, this does NOT work for GPU usage.

回答2:

There is an inherent randomness associated with deep learning leading to non reproducible results, But you can control it up to certain extent.

Since we are using Deep neural network, we can have different randomness affecting our reproducibility leading to different results such as

Randomness in Initialization, such as weights.
Randomness in Regularization, such as dropout.
Randomness in Layers.
Randomness in Optimization.

But there are several ways to mitigate this one option is to use summary statistics. Another method that will provide more reproducible result is to use a random seed with numpy and/or tensorflow, see:

https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.random.seed.html

https://www.tensorflow.org/api_docs/python/tf/set_random_seed

For the methods that are using GPUs we could specify it to use a deterministic method instead of the default non-deterministic method.For nvidia graphic cards see: docs.nvidia.com/cuda

回答3:

Basically, the key idea of making the result reproducible is to disable GPU. This is very important. To do this, just include

import os
import tensorflow as tf
import numpy as np
import random as rn

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""

sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)

from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)

at the very beginning your code. Hope this can help you.

来源：https://stackoverflow.com/questions/48631576/reproducible-results-using-keras-with-tensorflow-backend

标签

python-3.x

tensorflow

keras

random-seed