Calculating gradient norm wrt weights with keras

后端未结

关注

 2  1656

礼貌的吻别 2021-01-01 03:33

I am attempting to calculate the gradient norm with respect to the weights of a neural network with keras (as a diagnostic tool). Eventually, I want to create a callback for

2条回答

心在旅途 (楼主)

2021-01-01 03:53

There are several placeholders related to the gradient computation process in Keras:

Input x
Target y
Sample weights: even if you don't provide it in model.fit(), Keras still generates a placeholder for sample weights, and feed np.ones((y.shape[0],), dtype=K.floatx()) into the graph during training.
Learning phase: this placeholder will be connected to the gradient tensor only if there's any layer using it (e.g. Dropout).

So, in your provided example, in order to compute the gradients, you need to feed x, y and sample_weights into the graph. That's the underlying reason of the error.

Inside Model._make_train_function() there are the following lines showing how to construct the necessary inputs to K.function() in this case:

inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
    inputs += [K.learning_phase()]

with K.name_scope('training'):
    ...
    self.train_function = K.function(inputs,
                                     [self.total_loss] + self.metrics_tensors,
                                     updates=updates,
                                     name='train_function',
                                     **self._function_kwargs)

By mimicking this function, you should be able to get the norm value:

def get_gradient_norm_func(model):
    grads = K.gradients(model.total_loss, model.trainable_weights)
    summed_squares = [K.sum(K.square(g)) for g in grads]
    norm = K.sqrt(sum(summed_squares))
    inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
    func = K.function(inputs, [norm])
    return func

def main():
    x = np.random.random((128,)).reshape((-1, 1))
    y = 2 * x
    model = Sequential(layers=[Dense(2, input_shape=(1,)),
                               Dense(1)])
    model.compile(loss='mse', optimizer='rmsprop')
    get_gradient = get_gradient_norm_func(model)
    history = model.fit(x, y, epochs=1)
    print(get_gradient([x, y, np.ones(len(y))]))

Execution output:

Epoch 1/1
128/128 [==============================] - 0s - loss: 2.0073     
[4.4091368]

Note that since you're using Sequential instead of Model, model.model._feed_* is required instead of model._feed_*.

0 讨论(0)

查看其它2个回答