loss calculation over different batch sizes in keras

后端 未结 2 744
悲&欢浪女
悲&欢浪女 2020-12-18 05:58

I know that in theory, the loss of a network over a batch is just the sum of all the individual losses. This is reflected in the Keras code for calculating total loss. Relev

2条回答
  •  旧巷少年郎
    2020-12-18 06:27

    I would like to summarize the brilliant answers in this page.

    1. Certainly a model need a scalar value to optimize(i.e. Gradient Decent).
    2. This important value is calculated on batch level.(if you set batch size=1, it is stochastic gradient descent mode. so the gradient is calculated on that data point)
    3. In loss function, group aggregation function such as k.mean(), is specially activited on problems such as multi-classification, where to get one datapoint loss, we need sum many scalars along many labels.
    4. In the loss history printed by model.fit, the loss value printed is a running average on each batch. So the value we see is actually a estimated loss scaled for batch_size*per datapoint.

    5. Be aware that even if we set batch size=1, the printed history may use a different batch interval for print. In my case:

      self.model.fit(x=np.array(single_day_piece),y=np.array(single_day_reward),batch_size=1)
      

    The print is:

     1/24 [>.............................] - ETA: 0s - loss: 4.1276
     5/24 [=====>........................] - ETA: 0s - loss: -2.0592
     9/24 [==========>...................] - ETA: 0s - loss: -2.6107
    13/24 [===============>..............] - ETA: 0s - loss: -0.4840
    17/24 [====================>.........] - ETA: 0s - loss: -1.8741
    21/24 [=========================>....] - ETA: 0s - loss: -2.4558
    24/24 [==============================] - 0s 16ms/step - loss: -2.1474
    

    In my problem, there is no way a single datapoint loss could reach scale of 4.xxx.So I guess model take sum loss of first 4 datapoints. However,the batch size for tain is not 4.

提交回复
热议问题