Sequence to Sequence - for time series prediction

前端 未结 2 646
孤街浪徒
孤街浪徒 2020-12-17 02:12

I\'ve tried to build a sequence to sequence model to predict a sensor signal over time based on its first few inputs (see figure below)

The model works OK, but I wa

相关标签:
2条回答
  • 2020-12-17 02:39

    THIS IS THE ANSWER TO THE EDITED QUESTION

    first of all, when you call fit, decoder_inputs is a tensor and you can't use it to fit your model. the author of the code you cited, use an array of zeros and so you have to do the same (I do it in the dummy example below)

    secondly, look at your output layer in the model summary... it is 3D so you have to manage your target as 3D array

    thirdly, the decoder input must be 1 feature dimension and not 20 as you reported

    set initial parameters

    layers = [35, 35]
    learning_rate = 0.01
    decay = 0 
    optimiser = keras.optimizers.Adam(lr=learning_rate, decay=decay)
    
    num_input_features = 20
    num_output_features = 1
    loss = "mse"
    
    lambda_regulariser = 0.000001
    regulariser = None
    
    batch_size = 128
    steps_per_epoch = 200
    epochs = 100
    

    define encoder

    encoder_inputs = keras.layers.Input(shape=(None, num_input_features), name='encoder_input')
    
    encoder_cells = []
    for hidden_neurons in layers:
        encoder_cells.append(keras.layers.GRUCell(hidden_neurons,
                                                  kernel_regularizer=regulariser,
                                                  recurrent_regularizer=regulariser,
                                                  bias_regularizer=regulariser))
    
    encoder = keras.layers.RNN(encoder_cells, return_state=True, name='encoder_layer')
    encoder_outputs_and_states = encoder(encoder_inputs)
    encoder_states = encoder_outputs_and_states[1:] # only keep the states
    

    define decoder (1 feature dimension input!)

    decoder_inputs = keras.layers.Input(shape=(None, 1), name='decoder_input') #### <=== must be 1
    
    decoder_cells = []
    for hidden_neurons in layers:
        decoder_cells.append(keras.layers.GRUCell(hidden_neurons,
                                                  kernel_regularizer=regulariser,
                                                  recurrent_regularizer=regulariser,
                                                  bias_regularizer=regulariser))
    
    decoder = keras.layers.RNN(decoder_cells, return_sequences=True, return_state=True, name='decoder_layer')
    decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)
    
    decoder_outputs = decoder_outputs_and_states[0] # only keep the output sequence
    decoder_dense = keras.layers.Dense(num_output_features,
                                       activation='linear',
                                       kernel_regularizer=regulariser,
                                       bias_regularizer=regulariser)
    
    decoder_outputs = decoder_dense(decoder_outputs)
    

    define model

    model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_outputs)
    model.compile(optimizer=optimiser, loss=loss)
    model.summary()
    
    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    encoder_input (InputLayer)      (None, None, 20)     0                                            
    __________________________________________________________________________________________________
    decoder_input (InputLayer)      (None, None, 1)      0                                            
    __________________________________________________________________________________________________
    encoder_layer (RNN)             [(None, 35), (None,  13335       encoder_input[0][0]              
    __________________________________________________________________________________________________
    decoder_layer (RNN)             [(None, None, 35), ( 11340       decoder_input[0][0]              
                                                                     encoder_layer[0][1]              
                                                                     encoder_layer[0][2]              
    __________________________________________________________________________________________________
    dense_4 (Dense)                 (None, None, 1)      36          decoder_layer[0][0]              
    ==================================================================================================
    

    this is my dummy data. the same as yours in shapes. pay attention to decoder_zero_inputs it has the same dimension of your y but is an array of zeros

    train_x = np.random.uniform(0,1, (439, 5, 20))
    train_y = np.random.uniform(0,1, (439, 56, 1))
    validation_x = np.random.uniform(0,1, (10, 5, 20))
    validation_y = np.random.uniform(0,1, (10, 56, 1))
    decoder_zero_inputs = np.zeros((439, 56, 1)) ### <=== attention
    

    fitting

    history = model.fit([train_x, decoder_zero_inputs],train_y, epochs=epochs,
                         validation_split=0.3, verbose=1)
    
    Epoch 1/100
    307/307 [==============================] - 2s 8ms/step - loss: 0.1038 - val_loss: 0.0845
    Epoch 2/100
    307/307 [==============================] - 1s 2ms/step - loss: 0.0851 - val_loss: 0.0832
    Epoch 3/100
    307/307 [==============================] - 1s 2ms/step - loss: 0.0842 - val_loss: 0.0828
    

    prediction on validation

    pred_validation = model.predict([validation_x, np.zeros((10,56,1))])
    
    0 讨论(0)
  • 2020-12-17 03:04

    the attention layer in Keras is not a trainable layer (unless we use the scale parameter). it only computes matrix operation. In my opinion, this layer can result in some mistakes if applied directly on time series, but let proceed with order...

    the most natural choice to replicate the attention mechanism on our time-series problem is to adopt the solution presented here and explained again here. It's the classical application of attention in enc-dec structure in NLP

    following TF implementation, for our attention layer, we need query, value, key tensor in 3d format. we obtain these values directly from our recurrent layer. more specifically we utilize the sequence output and the hidden state. these are all we need to build an attention mechanism.

    query is the output sequence [batch_dim, time_step, features]

    value is the hidden state [batch_dim, features] where we add a temporal dimension for matrix operation [batch_dim, 1, features]

    as the key, we utilize as before the hidden state so key = value

    In the above definition and implementation I found 2 problems:

    • the scores are calculated with softmax(dot(sequence, hidden)). the dot is ok but the softmax following Keras implementation is calculated on the last dimension and not on the temporal dimension. this implies the scores to be all 1 so they are useless
    • the output attention is dot(scores, hidden) and not dot(scores, sequences) as we need

    the example:

    def attention_keras(query_value):
    
        query, value = query_value # key == value
        score = tf.matmul(query, value, transpose_b=True) # (batch, timestamp, 1)
        score = tf.nn.softmax(score) # softmax on -1 axis ==> score always = 1 !!!
        print((score.numpy()!=1).any()) # False ==> score always = 1 !!!
        score = tf.matmul(score, value) # (batch, timestamp, feat)
        return score
    
    np.random.seed(33)
    time_steps = 20
    features = 50
    sample = 5
    
    X = np.random.uniform(0,5, (sample,time_steps,features))
    state = np.random.uniform(0,5, (sample,features))
    attention_keras([X,tf.expand_dims(state,1)]) # ==> the same as Attention(dtype='float64')([X,tf.expand_dims(state,1)])
    

    so for this reason, for time series attention I propose this solution

    def attention_seq(query_value, scale):
    
        query, value = query_value
        score = tf.matmul(query, value, transpose_b=True) # (batch, timestamp, 1)
        score = scale*score # scale with a fixed number (it can be finetuned or learned during train)
        score = tf.nn.softmax(score, axis=1) # softmax on timestamp axis
        score = score*query # (batch, timestamp, feat)
        return score
    
    np.random.seed(33)
    time_steps = 20
    features = 50
    sample = 5
    
    X = np.random.uniform(0,5, (sample,time_steps,features))
    state = np.random.uniform(0,5, (sample,features))
    attention_seq([X,tf.expand_dims(state,1)], scale=0.05)
    

    query is the output sequence [batch_dim, time_step, features]

    value is the hidden state [batch_dim, features] where we add a temporal dimension for matrix operation [batch_dim, 1, features]

    the weights are calculated with softmax(scale*dot(sequence, hidden)). the scale parameter is a scalar value that can be used to scale the weights before applying the softmax operation. the softmax is calculated correctly on the time dimension. the attention output is the weighted product of input sequence and scores. I use the scalar parameter as a fixed value, but it can be tuned or insert as a learnable weight in a custom layer (as scale parameter in Keras attention).

    In term of network implementation these are the two possibilities available:

    ######### KERAS #########
    inp = Input((time_steps,features))
    seq, state = GRU(32, return_state=True, return_sequences=True)(inp)
    att = Attention()([seq, tf.expand_dims(state,1)])
    
    ######### CUSTOM #########
    inp = Input((time_steps,features))
    seq, state = GRU(32, return_state=True, return_sequences=True)(inp)
    att = Lambda(attention_seq, arguments={'scale': 0.05})([seq, tf.expand_dims(state,1)])
    

    CONCLUSION

    I don't know how much added-value an introduction of an attention layer in simple problems can have. If you have short sequences, I suggest you leave all as is. What I reported here is an answer where I express my considerations, I'll accept comment or consideration about possible mistakes or misunderstandings


    In your model, these solutions can be embedded in this way

    ######### KERAS #########
    inp = Input((n_features, n_steps))
    seq, state = GRU(n_units, activation='relu',
                     return_state=True, return_sequences=True)(inp)
    att = Attention()([seq, tf.expand_dims(state,1)])
    x = GRU(n_units, activation='relu')(att)
    x = Dense(64, activation='relu')(x)
    x = Dropout(0.5)(x)
    out = Dense(n_steps_out)(x)
    
    model = Model(inp, out)
    model.compile(optimizer='adam', loss='mse', metrics=['mse'])
    model.summary()
    
    ######### CUSTOM #########
    inp = Input((n_features, n_steps))
    seq, state = GRU(n_units, activation='relu',
                     return_state=True, return_sequences=True)(inp)
    att = Lambda(attention_seq, arguments={'scale': 0.05})([seq, tf.expand_dims(state,1)])
    x = GRU(n_units, activation='relu')(att)
    x = Dense(64, activation='relu')(x)
    x = Dropout(0.5)(x)
    out = Dense(n_steps_out)(x)
    
    model = Model(inp, out)
    model.compile(optimizer='adam', loss='mse', metrics=['mse'])
    model.summary()
    
    0 讨论(0)
提交回复
热议问题