Where do I call the BatchNormalization function in Keras?

前端 未结 7 1682
陌清茗
陌清茗 2020-12-02 03:23

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning?

I read this documentation for it: http://keras.io/la

7条回答
  •  情书的邮戳
    2020-12-02 04:17

    Batch Normalization is used to normalize the input layer as well as hidden layers by adjusting mean and scaling of the activations. Because of this normalizing effect with additional layer in deep neural networks, the network can use higher learning rate without vanishing or exploding gradients. Furthermore, batch normalization regularizes the network such that it is easier to generalize, and it is thus unnecessary to use dropout to mitigate overfitting.

    Right after calculating the linear function using say, the Dense() or Conv2D() in Keras, we use BatchNormalization() which calculates the linear function in a layer and then we add the non-linearity to the layer using Activation().

    from keras.layers.normalization import BatchNormalization
    model = Sequential()
    model.add(Dense(64, input_dim=14, init='uniform'))
    model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(64, init='uniform'))
    model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(2, init='uniform'))
    model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
    model.add(Activation('softmax'))
    
    sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='binary_crossentropy', optimizer=sgd)
    model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, 
    validation_split=0.2, verbose = 2)
    

    How is Batch Normalization applied?

    Suppose we have input a[l-1] to a layer l. Also we have weights W[l] and bias unit b[l] for the layer l. Let a[l] be the activation vector calculated(i.e. after adding the non-linearity) for the layer l and z[l] be the vector before adding non-linearity

    1. Using a[l-1] and W[l] we can calculate z[l] for the layer l
    2. Usually in feed-forward propagation we will add bias unit to the z[l] at this stage like this z[l]+b[l], but in Batch Normalization this step of addition of b[l] is not required and no b[l] parameter is used.
    3. Calculate z[l] means and subtract it from each element
    4. Divide (z[l] - mean) using standard deviation. Call it Z_temp[l]
    5. Now define new parameters γ and β that will change the scale of the hidden layer as follows:

      z_norm[l] = γ.Z_temp[l] + β

    In this code excerpt, the Dense() takes the a[l-1], uses W[l] and calculates z[l]. Then the immediate BatchNormalization() will perform the above steps to give z_norm[l]. And then the immediate Activation() will calculate tanh(z_norm[l]) to give a[l] i.e.

    a[l] = tanh(z_norm[l])
    

提交回复
热议问题