Strange behaviour of the loss function in keras model, with pretrained convolutional base

后端 未结 2 1800
长情又很酷
长情又很酷 2020-12-03 11:55

I\'m trying to create a model in Keras to make numerical predictions from the pictures. My model has densenet121 convolutional base, with couple of additional layer

2条回答
  •  自闭症患者
    2020-12-03 12:55

    But dropout layers usually create opposite effect making loss on evaluation less than loss during training.

    Not necessarily! Although in dropout layer some of the neurons are dropped, but bear in mind that the output is scaled back according to dropout rate. In inference time (i.e. test time) dropout is removed entirely and considering that you have only trained your model for just one epoch, the behavior you saw may happen. Don't forget that since you are training the model for just one epoch, only a portion of neurons have been dropped in the dropout layer but all of them are present at inference time.

    If you continue training the model for more epochs you might expect that the training loss and the test loss (on the same data) becomes more or less the same.

    Experiment it yourself: just set the trainable parameter of Dropout layer(s) to False and see whether this happens or not.


    One may be confused (as I was) by seeing that, after one epoch of training, the training loss is not equal to evaluation loss on the same batch of data. And this is not specific to models with Dropout or BatchNormalization layers. Consider this example:

    from keras import layers, models
    import numpy as np
    
    model = models.Sequential()
    model.add(layers.Dense(1000, activation='relu', input_dim=100))
    model.add(layers.Dense(1))
    
    model.compile(loss='mse', optimizer='adam')
    x = np.random.rand(32, 100)
    y = np.random.rand(32, 1)
    
    print("Training:")
    model.fit(x, y, batch_size=32, epochs=1)
    
    print("\nEvaluation:")
    loss = model.evaluate(x, y)
    print(loss)
    

    The output:

    Training:
    Epoch 1/1
    32/32 [==============================] - 0s 7ms/step - loss: 0.1520
    
    Evaluation:
    32/32 [==============================] - 0s 2ms/step
    0.7577340602874756
    

    So why the losses are different if they have been computed over the same data, i.e. 0.1520 != 0.7577?

    If you ask this, it's because you, like me, have not paid enough attention: that 0.1520 is the loss before updating the parameters of model (i.e. before doing backward pass or backpropagation). And 0.7577 is the loss after the weights of model has been updated. Even though that the data used is the same, the state of the model when computing those loss values is not the same (Another question: so why has the loss increased after backpropagation? It is simply because you have only trained it for just one epoch and therefore the weights updates are not stable enough yet).

    To confirm this, you can also use the same data batch as the validation data:

    model.fit(x, y, batch_size=32, epochs=1, validation_data=(x,y))
    

    If you run the code above with the modified line above you will get an output like this (obviously the exact values may be different for you):

    Training:
    Train on 32 samples, validate on 32 samples
    Epoch 1/1
    32/32 [==============================] - 0s 15ms/step - loss: 0.1273 - val_loss: 0.5344
    
    Evaluation:
    32/32 [==============================] - 0s 89us/step
    0.5344240665435791
    

    You see that the validation loss and evaluation loss are exactly the same: it is because the validation is performed at the end of epoch (i.e. when the model weights has already been updated).

提交回复
热议问题