Model with BatchNormalization: stagnant test loss

问题

I wrote a neural network using Keras. It contains BatchNormalization layers.

When I trained it with model.fit, everything was fine. When training it with tensorflow as explained here, the training is fine, but the validation step always give very poor performance, and it quickly saturates (the accuracy goes 5%, 10%, 40%, 40%, 40%..; the loss is stagnant too).

I need to use tensorflow because it allows more flexibility regarding the monitoring part of training.

I strongly suspect it has something to do with BN layers or/and the way I compute the test performances (see below)

feed_dict = {x: X_valid,
            batch_size_placeholder: X_valid.shape[0],
            K.learning_phase(): 0,
            beta: self.warm_up_schedule(global_step)
            }
if self.weights is not None:
    feed_dict[weights] = self.weights
acc = accuracy.eval(feed_dict=feed_dict)

Is there anything special to do when computing the validation accuracy of a model containing Keras BatchNormalizatin layers ?

Thank you in advance !

回答1:

Actually I found out about the training argument of the __call__ method of the BatchNormalization layer

So what you can do when instantiating the layer is just:

x = Input((dim1, dim2))
h = Dense(dim3)(x)
h = BatchNormalization()(h, training=K.learning_phase())

And when evaluating the performance on validation set:

feed_dict = {x: X_valid,
             batch_size_placeholder: X_valid.shape[0],
             K.learning_phase(): 0,
             beta: self.warm_up_schedule(global_step)
             }
acc = accuracy.eval(feed_dict=feed_dict)
summary_ = merged.eval(feed_dict=feed_dict)
test_writer.add_summary(summary_, global_step)

来源：https://stackoverflow.com/questions/43654483/model-with-batchnormalization-stagnant-test-loss

标签

tensorflow

keras

keras-layer