I've noticed that the new Estimator API automatically saves checkpoints during the training and automatically restart from the last checkpoint when training was interrupted. Unfortunately it seems it only keeps last 5 check points.
Do you know how to control the number of checkpoints that are kept during the training?
Tensorflow tf.estimator.Estimator takes config
as an optional argument, which can be a tf.estimator.RunConfig object to configure runtime settings.You can achieve this as follows:
# Change maximum number checkpoints to 25
run_config = tf.estimator.RunConfig()
run_config = run_config.replace(keep_checkpoint_max=25)
# Build your estimator
estimator = tf.estimator.Estimator(model_fn,
model_dir=job_dir,
config=run_config,
params=None)
config
parameter is available in all classes (DNNClassifier
, DNNLinearCombinedClassifier
, LinearClassifier
, etc.) that extend estimator.Estimator
.
来源:https://stackoverflow.com/questions/48028262/how-to-control-amount-of-checkpoint-kept-by-tensorflow-estimator