What is the difference between steps and epochs in TensorFlow?

前端 未结 6 1658
春和景丽
春和景丽 2020-11-28 18:16

In most of the models, there is a steps parameter indicating the number of steps to run over data. But yet I see in most practical usage, we also execute t

6条回答
  •  一整个雨季
    2020-11-28 18:31

    As I am currently experimenting with the tf.estimator API I would like to add my dewy findings here, too. I don't know yet if the usage of steps and epochs parameters is consistent throughout TensorFlow and therefore I am just relating to tf.estimator (specifically tf.estimator.LinearRegressor) for now.

    Training steps defined by num_epochs: steps not explicitly defined

    estimator = tf.estimator.LinearRegressor(feature_columns=ft_cols)
    train_input =  tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True)
    estimator.train(input_fn=train_input)
    

    Comment: I have set num_epochs=1 for the training input and the doc entry for numpy_input_fn tells me "num_epochs: Integer, number of epochs to iterate over data. If None will run forever.". With num_epochs=1 in the above example the training runs exactly x_train.size/batch_size times/steps (in my case this was 175000 steps as x_train had a size of 700000 and batch_size was 4).

    Training steps defined by num_epochs: steps explicitly defined higher than number of steps implicitly defined by num_epochs=1

    estimator = tf.estimator.LinearRegressor(feature_columns=ft_cols)
    train_input =  tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True)
    estimator.train(input_fn=train_input, steps=200000)
    

    Comment: num_epochs=1 in my case would mean 175000 steps (x_train.size/batch_size with x_train.size=700,000 and batch_size=4) and this is exactly the number of steps estimator.train albeit the steps parameter was set to 200,000 estimator.train(input_fn=train_input, steps=200000).

    Training steps defined by steps

    estimator = tf.estimator.LinearRegressor(feature_columns=ft_cols)
    train_input =  tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True)
    estimator.train(input_fn=train_input, steps=1000)
    

    Comment: Although I have set num_epochs=1 when calling numpy_input_fnthe training stops after 1000 steps. This is because steps=1000 in estimator.train(input_fn=train_input, steps=1000) overwrites the num_epochs=1 in tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True).

    Conclusion: Whatever the parameters num_epochs for tf.estimator.inputs.numpy_input_fn and steps for estimator.train define, the lower bound determines the number of steps which will be run through.

提交回复
热议问题