How to lock specific values of a Tensor in TensorFlow?

问题

I'm trying to apply the lottery ticket hypothesis to a simple neural network written in TensorFlow 2.0 (using the Keras interface) as seen below:

net = models.Sequential()
net.add(layers.Dense(256, activation="softsign", name="Dense0", bias_initializer="ones"))
net.add(layers.Dense(128, activation="softsign", name="Dense1", bias_initializer="ones"))
net.add(layers.Dense(64, activation="softsign", name="Dense2", bias_initializer="ones"))
net.add(layers.Dense(32, activation="softsign", name="Dense3", bias_initializer="ones"))
net.add(layers.Dense(1, activation="tanh", name="Output", bias_initializer="ones"))

And then I train my network using Adam optimizer and binary crossentropy loss:

net.compile(optimizer=optimizers.Adam(learning_rate=0.001),
            loss=losses.BinaryCrossentropy(), metrics=["accuracy"])
net.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_test, y_test))

After the training process, I want to lock specific weights in my network. The problem is, I can only lock a Tensor as non-trainable (as fas as I know) with tensorflow.Variable(..., trainable=False), but by doing that I'm setting the entire node of my graph as non-trainable, and I want only specific edges. I can iterate over all Tensor instances of my network with the code below:

for i in range(len(net.layers)):
    for j in range(net.layers[i].variables[0].shape[0]):
        for k in range(net.layers[i].variables[0][j].shape[0]):
            ...

But I don't know what to next. Does someone know a simple way I can do this?

回答1:

Maybe you could subclass the Dense layer? Something like

class PrunableDense(keras.layers.Dense):


    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.deleted_channels = None
        self.deleted_bias = None
        self._kernel=None
        self._bias=None


    def build(self, input_shape):
        last_dim = input_shape[-1]
        self._kernel = self.add_weight(
            'kernel',
            shape=[last_dim, self.units],
            initializer=self.kernel_initializer,
            regularizer=self.kernel_regularizer,
            constraint=self.kernel_constraint,
            dtype=self.dtype,
            trainable=True)
        self.deleted_channels = tf.ones([last_dim, self.units]) # we'll use this to prune the network
        if self.use_bias:
            self._bias = self.add_weight(
                'bias',
                shape=[self.units,],
                initializer=self.bias_initializer,
                regularizer=self.bias_regularizer,
                constraint=self.bias_constraint,
                dtype=self.dtype,
                trainable=True)
            self.deleted_bias = tf.ones([self.units,])

    @property
    def kernel(self):
        """gets called whenever self.kernel is used"""
        # only the weights that haven't been deleted should be non-zero
        # deleted weights are 0.'s in self.deleted_channels
        return self.deleted_channels * self._kernel  

    @property
    def bias(self):
        #similar to kernel
        if not self.use_bias:
            return None
        else:
            return self.deleted_bias * self._bias

    def prune_kernel(self, to_be_deleted):
        """
        Delete some channels
        to_be_deleted should be a tensor or numpy array of shape kernel.shape
        containing 1's at the locations where weights should be kept, and 0's 
        at the locations where weights should be deleted.
        """
        self.deleted_channels *= to_be_deleted

    def prune_bias(self, to_be_deleted):
        assert(self.use_bias)
        self.deleted_bias *= to_be_deleted

    def prune_kernel_below_threshold(self, threshold=0.01):
        to_be_deleted = tf.cast(tf.greater(self.kernel, threshold), tf.float32)
        self.deleted_channels *= to_be_deleted

    def prune_bias_below_threshold(self, threshold=0.01):
        assert(self.use_bias)
        to_be_deleted = tf.cast(tf.greater(self.bias, threshold), tf.float32)
        self.deleted_bias *= to_be_deleted

I haven't tested this too extensively, and it definitely needs some polishing but I think the idea should work.

Edit: I wrote the above assuming you want to prune the network as in the lottery ticket hypothesis, but in case you want to just freeze part of the weights you could do something similar, but adding a frozen_kernel attribute with non-zero entries only on the places where self.deleted_channels is 0, and adding it to the trainable kernel.

Edit 2: with the previous edit I meant something like the following:

class FreezableDense(keras.layers.Dense):


    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.trainable_channels = None
        self.trainable_bias = None
        self._kernel1 = None
        self._bias1 = None
        self._kernel2 = None
        self._bias2 = None


    def build(self, input_shape):
        last_dim = input_shape[-1]
        self._kernel1 = self.add_weight(
            'kernel1',
            shape=[last_dim, self.units],
            initializer=self.kernel_initializer,
            regularizer=self.kernel_regularizer,
            constraint=self.kernel_constraint,
            dtype=self.dtype,
            trainable=True)
        self._kernel2 = tf.zeros([last_dim, self.units])
        self.trainable_channels = tf.ones([last_dim, self.units]) # we'll use this to freeze parts of the network
        if self.use_bias:
            self._bias1 = self.add_weight(
                'bias',
                shape=[self.units,],
                initializer=self.bias_initializer,
                regularizer=self.bias_regularizer,
                constraint=self.bias_constraint,
                dtype=self.dtype,
                trainable=True)
            self._bias2 = tf.zeros([self.units,])
            self.trainable_bias = tf.ones([self.units,])

    @property
    def kernel(self):
        """gets called whenever self.kernel is used"""
        # frozen 
        return self.trainable_channels * self._kernel1 + (1 - self.trainable_channels) * self._kernel2

    @property
    def bias(self):
        #similar to kernel
        if not self.use_bias:
            return None
        else:
            return self.trainable_bias * self._bias1 + (1 - self.trainable_bias) * self._bias2

    def freeze_kernel(self, to_be_frozen):
        """
        freeze some channels
        to_be_frozen should be a tensor or numpy array of shape kernel.shape
        containing 1's at the locations where weights should be kept trainable, and 0's 
        at the locations where weights should be frozen.
        """
        # we want to do two things: update the weights in self._kernel2 
        # and update self.trainable_channels
        # first we update self._kernel2 with all newly frozen weights
        newly_frozen = 1 - tf.maximum((1 - to_be_frozen) - (1 - self.trainable_channels), 0)
        # the above should have 0 only where to_be_frozen is 0 and self.trainable_channels is 1
        # if I'm not mistaken that is
        newly_frozen_weights = (1-newly_frozen)*self._kernel1
        self._kernel2 += newly_frozen_weights

        # now we update self.trainable_channels:
        self.trainable_channels *= to_be_frozen

    def prune_bias(self, to_be_deleted):
        assert(self.use_bias)
        newly_frozen = 1 - tf.maximum((1 - to_be_frozen) - (1 - self.trainable_bias), 0)
        newly_frozen_bias = (1-newly_frozen)*self._bias1
        self._bias2 += newly_frozen_bias
        self.trainable_bias *= to_be_frozen

(Again not too elaborately tested and definitely needs some polishing but I think the idea should work)

Edit 3: Googling more got me what I initially couldn't find: https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/sparsity/keras migth provide tools to more easily build a pruned model.

Edit 4 (further explanation of the role of _kernel2 and _bias2):

For simplicity I'll do the explanation for no bias, but mutatis mutandis everything works the same with bias. Suppose the input to your dense layer is n dimensional, and your output is m dimensional, then what a dense layer does is multiply the input by an m-by-n matrix, which we'll call K for short (it's the kernel).

Usually we want to learn the right entries of K through some gradient based optimisation method, but in your case, you want to keep certain entries fixed. That's why in this custom Dense layer, we split K as follows:

K = T * K1 + (1 - T) * K2,

where

T is an m-by-n matrix consisting of 0's and 1's,
the asterisk denotes element wise multiplication
1 is the m-by-n matrix with 1 for every entry
K1 is an m-by-n matrix which can be learned
K2 is an m-by-n matrix which is fixed (constant) during training.

If we look at the entries of K, then K[i,j] = T[i,j]*K1[i,j] + (1-T[i,j])*K2[i,j] = K1[i,j] if T[i,j]==1 else K2[i,j]. Since in the latter case the value of K1[i,j] has no influence on the outcome of multiplication by K, its gradient is 0 and shouldn't change (and even if it does change due to numerical errors, that shouldn't have an effect on the value of K[i,j]).

So essentially, the entries K[i,j] of K for which T[i,j]==0 are fixed (with value stored in K2), and those for which T[i,j]==1 can be trained.

来源：https://stackoverflow.com/questions/58421182/how-to-lock-specific-values-of-a-tensor-in-tensorflow

标签

python

tensorflow

keras

tensorflow2.0