Strange behaviour of the loss function in keras model, with pretrained convolutional base

后端未结

关注

 2  1806

长情又很酷 2020-12-03 11:55

I\'m trying to create a model in Keras to make numerical predictions from the pictures. My model has densenet121 convolutional base, with couple of additional layer

2条回答

天涯浪人 (楼主)

2020-12-03 12:36
Looks like I found the solution. As I have suggested the problem is with BatchNormalization layers. They make tree things
1. subtract mean and normalize by std
2. collect statistics on mean and std using running average
3. train two additional parameters (two per node).
When one sets trainable to False, these two parameters freeze and layer also stops collecting statistic on mean and std. But it looks like the layer still performs normalization during training time using the training batch. Most likely it's a bug in keras or maybe they did it on purpose for some reason. As a result the calculations on forward propagation during training time are different as compared with prediction time even though the trainable atribute is set to False.

There are two possible solutions i can think of:
1. To set all BatchNormalization layers to trainable. In this case these layers will collect statistics from your dataset instead of using pretrained one (which can be significantly different!). In this case you will adjust all the BatchNorm layers to your custom dataset during the training.
2. Split the model in two parts model=model_base+model_top. After that, use model_base to extract features by model_base.predict() and then feed these features into model_top and train only the model_top.
I've just tried the first solution and it looks like it's working:
```
model.fit(x=dat[0],y=dat[1],batch_size=32)

Epoch 1/1
32/32 [==============================] - 1s 28ms/step - loss: **3.1053**

model.evaluate(x=dat[0],y=dat[1])

32/32 [==============================] - 0s 10ms/step
**2.487905502319336**
```
This was after some training - one need to wait till enough statistics on mean and std are collected.

Second solution i haven't tried yet, but i'm pretty sure it's gonna work since forward propagation during training and prediction will be the same.

Update. I found a great blog post where this issue has been discussed in all the details. Check it out here
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...