I am trying to implement discriminant condition codes in Keras as proposed in
Xue, Shaofei, et al., "Fast adaptation of deep neural network based on discriminant codes for speech recognition."
The main idea is you encode each condition as an input parameter and let the network learn dependency between the condition and the feature-label mapping. On a new dataset instead of adapting the entire network you just tune these weights using backprop. For example say my network looks like this
X ---->|----|
|DNN |----> Y
Z --- >|----|
X
: features Y
: labels Z
:condition codes
Now given a pretrained DNN, and X'
,Y'
on a new dataset I am trying to estimate the Z'
using backprop that will minimize prediction error on Y'
. The math seems straightforward except I am not sure how to implement this in keras without having access to the backprop itself.
For instance, can I add an Input() layer with trainable=True with all other layers set to trainable= False. Can backprop in keras update more than just layer weights? Or is there a way to hack keras layers to do this?
Any suggestions welcome. thanks
I figured out how to do this (exactly) in Keras by looking at fchollet
's post here
Using the keras backend I was able to compute the gradient of my loss w.r.t to Z
directly and used it to drive the update.
Code below:
import keras.backend as K
import numpy as np
model.summary() #Pretrained model
loss = K.categorical_crossentropy(Y, Y_out)
grads = K.gradients(loss, Z)
grads /= (K.sqrt(K.mean(K.square(grads)))+ 1e-5)
iterate = K.function([X,Z],[loss,grads])
step = 0.1
Z_adapt = Z_in.copy()
for i in range(100):
loss_val, grads_val = iterate([X_in,Z_adapt])
Z_adapt -= grads_val[0] * step
print "iter:",i,np.mean(loss_value)
print "Before:"
print model.evaluate([X_in, Z_in],Y_out)
print "After:"
print model.evaluate([X_in, Z_adapt],Y_out)
X,Y,Z
are nodes in the model graph. Z_in
is an initial value for Z'
. I set it to an average value from the train set. Z_adapt
is after 100 iterations of gradient descent and should give you a better result.
Assume that the size of Z
is m x n
. Then you can first define an input layer of size m * n x 1
. The input will be an m * n x 1
vector of ones. You can define a dense layer containing m * n
neurons and set trainable = True
for it. The response of this layer will give you a flattened version of Z
. Reshape it appropriately and give it as input to the rest of the network that can be appended ahead of this.
Keep in mind that if the size of Z
is too large, then network may not be able to learn a dense layer of that many neurons. In that case, maybe you need to put additional constraints or look into convolutional layers. However, convolutional layers will put some constraints on Z
.
来源:https://stackoverflow.com/questions/43353851/tune-input-features-using-backprop-in-keras