问题

I am training a sequence to sequence model for variable length sequences with Keras, but I am running into some unexpected problems. It is unclear to me whether the behaviour I am observing is the desired behaviour of the library and why it would be.

Model Creation

I've made a recurrent model with an embeddings layer and a GRU recurrent layer that illustrates the problem. I used mask_zero=0.0 for the embeddings layer instead of a masking layer, but changing this doesn't seem to make a difference (nor does adding a masking layer before the output):

import numpy
from keras.layers import Embedding, GRU, TimeDistributed, Dense, Input
from keras.models import Model
import keras.preprocessing.sequence

numpy.random.seed(0)
input_layer = Input(shape=(3,), dtype='int32', name='input')
embeddings = Embedding(input_dim=20, output_dim=2, input_length=3, mask_zero=True, name='embeddings')(input_layer)
recurrent = GRU(5, return_sequences=True, name='GRU')(embeddings)
output_layer = TimeDistributed(Dense(1), name='output')(recurrent)
model = Model(input=input_layer, output=output_layer)
output_weights = model.layers[-1].get_weights()
output_weights[1] = numpy.array([0.2])
model.layers[-1].set_weights(output_weights)
model.compile(loss='mse', metrics=['mse'], optimizer='adam', sample_weight_mode='temporal')

I use masking and the sample_weight parameter to exclude the padding values from the training/evaluation. I will test this model on one input/output sequence which I pad using the Keras padding function:

X = [[1, 2]] 
X_padded = keras.preprocessing.sequence.pad_sequences(X, dtype='float32', maxlen=3) 
Y = [[[1], [2]]] 
Y_padded = keras.preprocessing.sequence.pad_sequences(Y, maxlen=3, dtype='float32')

Output Shape

Why the output is expected to be formatted in this way. Why can I not use input/output sequences that have exactly the same dimensionality? model.evaluate(X_padded, Y_padded) gives me a dimensionality error.

Then, when I run model.predict(X_padded) I get the following output (with numpy.random.seed(0) before generating the model):

[[[ 0.2       ]
  [ 0.19946882]
  [ 0.19175649]]]

Why isn't the first input masked for the output layer? Is the output_value computed anyways (and equal to the bias, as the hidden layer values are 0? This does not seem desirable. Adding a Masking layer before the output layer does not solve this problem.

MSE calculation

Then, when I evaluate the model (model.evaluate(X_padded, Y_padded)), this returns the Mean Squared Error (MSE) of the entire sequence (1.3168) including this first value, which I suppose is to be expected when it isn't masked, but not what I would want.

From the Keras documentation I understand I should use the sample_weight parameter to solve this problem, which I tried:

sample_weight = numpy.array([[0, 1, 1]])
model_evaluation = model.evaluate(X_padded, Y_padded, sample_weight=sample_weight)
print model.metrics_names, model_evaluation

The output I get is

['loss', 'mean_squared_error'] [2.9329459667205811, 1.3168648481369019]

This leaves the metric (MSE) unaltered, it is still the MSE over all values, including the one that I wanted masked. Why? This is not what I want when I evaluate my model. It does cause a change in the loss value, which appears to be the MSE over the last two values normalised to not give more weight to longer sequences.

Am I doing something wrong with the sample weights? Also, I can really not figure out how this loss value came about. What should I do to exclude the padded values from both training and evaluation (I assume the sample_weight parameter works the same in the fit function).

回答1:

It was indeed a bug in the library, in Keras 2 this issue is resolved.

来源：https://stackoverflow.com/questions/39660863/strange-behaviour-sequence-to-sequence-learning-for-variable-length-sequences

标签

keras

masking

recurrent-neural-network

Strange behaviour sequence to sequence learning for variable length sequences

问题

Model Creation

Output Shape

MSE calculation

回答1: