Siamese Model with LSTM network fails to train using tensorflow

Dataset Description

The dataset contains a set of question pairs and a label which tells if the questions are same. e.g.

"How do I read and find my YouTube comments?" , "How can I see all my Youtube comments?" , "1"

The goal of the model is to identify if the given question pair is same or different.

Approach

I have created a Siamese network to identify if two questions are same. Following is the model:

graph = tf.Graph()

with graph.as_default():
    embedding_placeholder = tf.placeholder(tf.float32, shape=embedding_matrix.shape, name='embedding_placeholder')
    with tf.variable_scope('siamese_network') as scope:
        labels = tf.placeholder(tf.int32, [batch_size, None], name='labels')
        keep_prob = tf.placeholder(tf.float32, name='question1_keep_prob')

        with tf.name_scope('question1') as question1_scope:
            question1_inputs = tf.placeholder(tf.int32, [batch_size, seq_len], name='question1_inputs')

            question1_embedding = tf.get_variable(name='embedding', initializer=embedding_placeholder, trainable=False)
            question1_embed = tf.nn.embedding_lookup(question1_embedding, question1_inputs)

            question1_lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
            question1_drop = tf.contrib.rnn.DropoutWrapper(question1_lstm, output_keep_prob=keep_prob)
            question1_multi_lstm = tf.contrib.rnn.MultiRNNCell([question1_drop] * lstm_layers)

            q1_initial_state = question1_multi_lstm.zero_state(batch_size, tf.float32)

            question1_outputs, question1_final_state = tf.nn.dynamic_rnn(question1_multi_lstm, question1_embed, initial_state=q1_initial_state)

        scope.reuse_variables()

        with tf.name_scope('question2') as question2_scope:
            question2_inputs = tf.placeholder(tf.int32, [batch_size, seq_len], name='question2_inputs')

            question2_embedding = question1_embedding
            question2_embed = tf.nn.embedding_lookup(question2_embedding, question2_inputs)

            question2_lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
            question2_drop = tf.contrib.rnn.DropoutWrapper(question2_lstm, output_keep_prob=keep_prob)
            question2_multi_lstm = tf.contrib.rnn.MultiRNNCell([question2_drop] * lstm_layers)

            q2_initial_state = question2_multi_lstm.zero_state(batch_size, tf.float32)

            question2_outputs, question2_final_state = tf.nn.dynamic_rnn(question2_multi_lstm, question2_embed, initial_state=q2_initial_state)

Calculate the cosine distance using the RNN outputs:

with graph.as_default():
    diff = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(question1_outputs[:, -1, :], question2_outputs[:, -1, :])), reduction_indices=1))

    margin = tf.constant(1.) 
    labels = tf.to_float(labels)
    match_loss = tf.expand_dims(tf.square(diff, 'match_term'), 0)
    mismatch_loss = tf.expand_dims(tf.maximum(0., tf.subtract(margin, tf.square(diff)), 'mismatch_term'), 0)

    loss = tf.add(tf.matmul(labels, match_loss), tf.matmul((1 - labels), mismatch_loss), 'loss_add')
    distance = tf.reduce_mean(loss)

    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(distance)

Following is the code to train the model:

with graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer(), feed_dict={embedding_placeholder: embedding_matrix})

    iteration = 1
    for e in range(epochs):
        summary_writer = tf.summary.FileWriter('/Users/mithun/projects/kaggle/quora_question_pairs/logs', sess.graph)
        summary_writer.add_graph(sess.graph)

        for ii, (x1, x2, y) in enumerate(get_batches(question1_train, question2_train, label_train, batch_size), 1):
            feed = {question1_inputs: x1,
                    question2_inputs: x2,
                    labels: y[:, None],
                    keep_prob: 0.9
                   }
            loss1 = sess.run([distance], feed_dict=feed)

            if iteration%5==0:
                print("Epoch: {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss1))

            if iteration%50==0:
                val_acc = []
                for x1, x2, y in get_batches(question1_val, question2_val, label_val, batch_size):
                    feed = {question1_inputs: x1,
                            question2_inputs: x2,
                            labels: y[:, None],
                            keep_prob: 1
                           }
                    batch_acc = sess.run([accuracy], feed_dict=feed)
                    val_acc.append(batch_acc)
                print("Val acc: {:.3f}".format(np.mean(val_acc)))
            iteration +=1

    saver.save(sess, "checkpoints/quora_pairs.ckpt")

I have trained the above model with about 10,000 labeled data. But, the accuracy is stagnant at around 0.630 and strangely the validation accuracy is same across all the iterations.

lstm_size = 64
lstm_layers = 1
batch_size = 128
learning_rate = 0.001

Is there anything wrong with the way I have created the model?

This is a common problem with imbalanced datasets like the recently released Quora dataset which you are using. Since the Quora dataset is imbalanced (~63% negative and ~37% positive examples) you need proper initialization of weights. Without weight initialization your solution will be stuck in a local minima and it will train to predict only the negative class. Hence the 63% accuracy, because that is the percentage of 'not similar' questions in your validation data. If you check the results obtained on your validation set you will notice that it predicts all zeros. A truncated normal distribution proposed in He et al., http://arxiv.org/abs/1502.01852 is a good alternate for initializing the weights.

来源：https://stackoverflow.com/questions/44116689/siamese-model-with-lstm-network-fails-to-train-using-tensorflow

标签

tensorflow

lstm

recurrent-neural-network