Interval between checkpoints in tensorflow

我的梦境 提交于 2019-12-25 16:39:51

问题


How can I specify the interval between 2 consecutive checkpoints in tensorflow? There are no options in tf.train.Saver to specify that. Every time, I run the model with a different number of global steps, I get a new interval between checkpoints


回答1:


The tf.train.Saver is a "passive" utility for writing checkpoints, and it only writes a checkpoint when some other code calls its .save() method. Therefore, the rate at which checkpoints are written depends on what framework you are using to train your model:

  • If you are using the low-level TensorFlow API (tf.Session) and writing your own training loop, you can simply insert calls to Saver.save() in your own code. A common approach is to do this based on the iteration count:

    for i in range(NUM_ITERATIONS):
      sess.run(train_op)
      # ...
      if i % 1000 == 0:
        saver.save(sess, ...)  # Write a checkpoint every 1000 steps.
    
  • If you are using tf.train.MonitoredTrainingSession, which writes checkpoints for you, you can specify a checkpoint interval (in seconds) in the constructor. By default it saves a checkpoint every 10 minutes. To change this to every minute, you would do:

    with tf.train.MonitoredTrainingSession(..., save_checkpoint_secs=60):
      # ...
    



回答2:


Thanks! This fixed my problem: tf.contrib.slim.learning.train( train_op, checkpoint_dir, log_every_n_steps=args.log_every_n_steps, graph=g,
global_step=model.global_step, number_of_steps=args.number_of_steps, init_fn=model.init_fn, save_summaries_secs=300, save_interval_secs=300, saver=saver)



来源:https://stackoverflow.com/questions/42738297/interval-between-checkpoints-in-tensorflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!