Calculating gradient norm wrt weights with keras

后端 未结 2 1656
礼貌的吻别
礼貌的吻别 2021-01-01 03:33

I am attempting to calculate the gradient norm with respect to the weights of a neural network with keras (as a diagnostic tool). Eventually, I want to create a callback for

2条回答
  •  心在旅途
    2021-01-01 03:53

    There are several placeholders related to the gradient computation process in Keras:

    1. Input x
    2. Target y
    3. Sample weights: even if you don't provide it in model.fit(), Keras still generates a placeholder for sample weights, and feed np.ones((y.shape[0],), dtype=K.floatx()) into the graph during training.
    4. Learning phase: this placeholder will be connected to the gradient tensor only if there's any layer using it (e.g. Dropout).

    So, in your provided example, in order to compute the gradients, you need to feed x, y and sample_weights into the graph. That's the underlying reason of the error.

    Inside Model._make_train_function() there are the following lines showing how to construct the necessary inputs to K.function() in this case:

    inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
    if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
        inputs += [K.learning_phase()]
    
    with K.name_scope('training'):
        ...
        self.train_function = K.function(inputs,
                                         [self.total_loss] + self.metrics_tensors,
                                         updates=updates,
                                         name='train_function',
                                         **self._function_kwargs)
    

    By mimicking this function, you should be able to get the norm value:

    def get_gradient_norm_func(model):
        grads = K.gradients(model.total_loss, model.trainable_weights)
        summed_squares = [K.sum(K.square(g)) for g in grads]
        norm = K.sqrt(sum(summed_squares))
        inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
        func = K.function(inputs, [norm])
        return func
    
    def main():
        x = np.random.random((128,)).reshape((-1, 1))
        y = 2 * x
        model = Sequential(layers=[Dense(2, input_shape=(1,)),
                                   Dense(1)])
        model.compile(loss='mse', optimizer='rmsprop')
        get_gradient = get_gradient_norm_func(model)
        history = model.fit(x, y, epochs=1)
        print(get_gradient([x, y, np.ones(len(y))]))
    

    Execution output:

    Epoch 1/1
    128/128 [==============================] - 0s - loss: 2.0073     
    [4.4091368]
    

    Note that since you're using Sequential instead of Model, model.model._feed_* is required instead of model._feed_*.

提交回复
热议问题