loss calculation over different batch sizes in keras

后端未结

关注

 2  744

悲&欢浪女 2020-12-18 05:58

I know that in theory, the loss of a network over a batch is just the sum of all the individual losses. This is reflected in the Keras code for calculating total loss. Relev

2条回答

旧巷少年郎 (楼主)

2020-12-18 06:27
I would like to summarize the brilliant answers in this page.
1. Certainly a model need a scalar value to optimize(i.e. Gradient Decent).
2. This important value is calculated on batch level.(if you set batch size=1, it is stochastic gradient descent mode. so the gradient is calculated on that data point)
3. In loss function, group aggregation function such as k.mean(), is specially activited on problems such as multi-classification, where to get one datapoint loss, we need sum many scalars along many labels.
4. In the loss history printed by model.fit, the loss value printed is a running average on each batch. So the value we see is actually a estimated loss scaled for batch_size*per datapoint.
5. Be aware that even if we set batch size=1, the printed history may use a different batch interval for print. In my case:
```
self.model.fit(x=np.array(single_day_piece),y=np.array(single_day_reward),batch_size=1)
```
The print is:
```
 1/24 [>.............................] - ETA: 0s - loss: 4.1276
 5/24 [=====>........................] - ETA: 0s - loss: -2.0592
 9/24 [==========>...................] - ETA: 0s - loss: -2.6107
13/24 [===============>..............] - ETA: 0s - loss: -0.4840
17/24 [====================>.........] - ETA: 0s - loss: -1.8741
21/24 [=========================>....] - ETA: 0s - loss: -2.4558
24/24 [==============================] - 0s 16ms/step - loss: -2.1474
```
In my problem, there is no way a single datapoint loss could reach scale of 4.xxx.So I guess model take sum loss of first 4 datapoints. However,the batch size for tain is not 4.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...