I've noticed that the new Estimator API automatically saves checkpoints during the training and automatically restart from the last checkpoint when training was interrupted. Unfortunately it seems it only keeps last 5 check points.
Do you know how to control the number of checkpoints that are kept during the training?
Tensorflow tf.estimator.Estimator takes config as an optional argument, which can be a tf.estimator.RunConfig object to configure runtime settings.You can achieve this as follows:
# Change maximum number checkpoints to 25
run_config = tf.estimator.RunConfig()
run_config = run_config.replace(keep_checkpoint_max=25)
# Build your estimator
estimator = tf.estimator.Estimator(model_fn,
model_dir=job_dir,
config=run_config,
params=None)
config parameter is available in all classes (DNNClassifier, DNNLinearCombinedClassifier, LinearClassifier, etc.) that extend estimator.Estimator.
来源:https://stackoverflow.com/questions/48028262/how-to-control-amount-of-checkpoint-kept-by-tensorflow-estimator