问题
I get different results (test accuracy) every time I run the imdb_lstm.py
example from Keras framework (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py)
The code contains np.random.seed(1337)
in the top, before any keras imports. It should prevent it from generating different numbers for every run. What am I missing?
UPDATE: How to repro:
- Install Keras (http://keras.io/)
- Execute https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py a few times. It will train the model and output test accuracy.
Expected result: Test accuracy is the same on every run.
Actual result: Test accuracy is different on every run.
UPDATE2: I\'m running it on Windows 8.1 with MinGW/msys, module versions:
theano 0.7.0
numpy 1.8.1
scipy 0.14.0c1
UPDATE3: I narrowed the problem down a bit. If I run the example with GPU (set theano flag device=gpu0) then I get different test accuracy every time, but if I run it on CPU then everything works as expected. My graphics card: NVIDIA GeForce GT 635)
回答1:
You can find the answer at the Keras docs: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development.
In short, to be absolutely sure that you will get reproducible results with your python script on one computer's/laptop's CPU then you will have to do the following:
- Set the
PYTHONHASHSEED
environment variable at a fixed value - Set the
python
built-in pseudo-random generator at a fixed value - Set the
numpy
pseudo-random generator at a fixed value - Set the
tensorflow
pseudo-random generator at a fixed value - Configure a new global
tensorflow
session
Following the Keras
link at the top, the source code I am using is the following:
# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0
# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)
# 2. Set the `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)
# 3. Set the `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)
# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.set_random_seed(seed_value)
# 5. Configure a new global `tensorflow` session
from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
It is needless to say that you do not have to to specify any seed
or random_state
at the numpy
, scikit-learn
or tensorflow
/keras
functions that you are using in your python script exactly because with the source code above we set globally their pseudo-random generators at a fixed value.
回答2:
Theano's documentation talks about the difficulties of seeding random variables and why they seed each graph instance with its own random number generator.
Sharing a random number generator between different {{{RandomOp}}} instances makes it difficult to producing the same stream regardless of other ops in graph, and to keep {{{RandomOps}}} isolated. Therefore, each {{{RandomOp}}} instance in a graph will have its very own random number generator. That random number generator is an input to the function. In typical usage, we will use the new features of function inputs ({{{value}}}, {{{update}}}) to pass and update the rng for each {{{RandomOp}}}. By passing RNGs as inputs, it is possible to use the normal methods of accessing function inputs to access each {{{RandomOp}}}’s rng. In this approach it there is no pre-existing mechanism to work with the combined random number state of an entire graph. So the proposal is to provide the missing functionality (the last three requirements) via auxiliary functions: {{{seed, getstate, setstate}}}.
They also provide examples on how to seed all the random number generators.
You can also seed all of the random variables allocated by a RandomStreams object by that object’s seed method. This seed will be used to seed a temporary random number generator, that will in turn generate seeds for each of the random variables.
>>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each
回答3:
I finally got reproducible results with my code. It's a combination of answers I saw around the web. The first thing is doing what @alex says:
- Set
numpy.random.seed
; - Use
PYTHONHASHSEED=0
for Python 3.
Then you have to solve the issue noted by @user2805751 regarding cuDNN by calling your Keras code with the following additional THEANO_FLAGS
:
dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic
And finally, you have to patch your Theano installation as per this comment, which basically consists in:
- replacing all calls to
*_dev20
operator by its regular version intheano/sandbox/cuda/opt.py
.
This should get you the same results for the same seed.
Note that there might be a slowdown. I saw a running time increase of about 10%.
回答4:
I would like to add something to the previous answers. If you use python 3 and you want to get reproducible results for every run, you have to
- set numpy.random.seed in the beginning of your code
- give PYTHONHASHSEED=0 as a parameter to the python interpreter
回答5:
I have trained and tested Sequential()
kind of neural networks using Keras. I performed non linear regression on noisy speech data. I used the following code to generate random seed :
import numpy as np
seed = 7
np.random.seed(seed)
I get the exact same results of val_loss
each time I train and test on the same data.
回答6:
This works for me:
SEED = 123456
import os
import random as rn
import numpy as np
from tensorflow import set_random_seed
os.environ['PYTHONHASHSEED']=str(SEED)
np.random.seed(SEED)
set_random_seed(SEED)
rn.seed(SEED)
回答7:
I agree with the previous comment, but reproducible results sometimes needs the same environment(e.g. installed packages, machine characteristics and so on). So that, I recommend to copy your environment to other place in case to have reproducible results. Try to use one of the next technologies:
- Docker. If you have a Linux this very easy to move your environment to other place. Also you can try to use DockerHub.
- Binder. This is a cloud platform for reproducing scientific experiments.
- Everware. This is yet another cloud platform for "reusable science". See the project repository on Github.
来源:https://stackoverflow.com/questions/32419510/how-to-get-reproducible-results-in-keras