CNTK: how do I create a sequence of tensors from a single tensor?

拥有回忆 提交于 2020-01-01 19:44:20

问题


I have a working TensorFlow seq2seq model I've been using for image captioning that I'd like to convert over to CNTK, but I'm having trouble getting the input to my LSTM in the right format.

Here's what I do in my TensorFlow network:

max_seq_length = 40
embedding_size = 512

self.x_img = tf.placeholder(tf.float32, [None, 2048])
self.x_txt = tf.placeholder(tf.int32, [None, max_seq_length])
self.y = tf.placeholder(tf.int32, [None, max_seq_length])        

with tf.device("/cpu:0"):
    image_embed_inputs = tf.layers.dense(inputs=self.x_img, units=embedding_size)
    image_embed_inputs = tf.reshape(image_embed_inputs, [-1, 1, embedding_size])
    image_embed_inputs = tf.contrib.layers.batch_norm(image_embed_inputs, center=True, scale=True, is_training=is_training, scope='bn')
    text_embedding = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -init_scale, init_scale))
    text_embed_inputs = tf.nn.embedding_lookup(text_embedding, self.x_txt)

inputs = tf.concat([image_embed_inputs, text_embed_inputs], 1)        

I'm basically doing this:

I'm taking the last 2048-dim layer of a pretrained 50-layer ResNet as part of my input. I'm then embedding that in 512-dim space via a basic dense layer (image_embed_inputs).

Simultaneously, I have a 40-element long sequence of text tokens (x_txt) that I'm embedding into 512-dim space (text_embedding / text_embed_inputs).

I'm then concatenating them together into a [-1, 41, 512] tensor, which is the actual input to my LSTM. The first element ([-1, 0, 512]) is the image embedding, and the remaining 40 elements ([-1, 1:41, 512]) are the embeddings for each text token in my input sequence.

Ultimately, that works & does what I need it to do in TensorFlow. Now, I'd like to do something similar in CNTK. I'm looking at the seq2seq tutorial but I haven't figured out how to set up the input for my CNTK LSTM yet.

I've taken the 2048-dim ResNet embedding, the 40-dim input text token sequence and the 40-dim label text token sequence, and stored them in CTF text format (concatenating the ResNet embedding and the input text token sequence together), so they can be read like so:

def create_reader(path, is_training, input_dim, label_dim):
    return MinibatchSource(CTFDeserializer(path, StreamDefs(
        features=StreamDef(field='x', shape=2088, is_sparse=True),
        labels=StreamDef(field='y', shape=40, is_sparse=False)
        )), randomize=is_training,
        max_sweeps=INFINITELY_REPEAT if is_training else 1)

What I'd like to do at train/test time is take that features input tensor, break back it into a 2048-dim ResNet embedding and a 40-dim input text token sequence, and then set up CNTK sequence entities to feed into my network. So far, though, I haven't been able to figure out how to do that. This is where I am:

def lstm(input, embedding_dim, LSTM_dim, cell_dim, vocab_dim):
    x_image = C.slice(input, 0, 0, 2048)
    x_text = C.slice(input, 0, 2048, 2088)

    x_text_seq = sequence.input_variable(shape=[vocab_dim], is_sparse=False)

    # How do I get the data from x_text into x_text_seq?

    image_embedding = Embedding(embedding_dim)
    text_embedding = Embedding(x_text_seq)
    lstm_input = C.splice(image_embedding, text_embedding)

I'm not sure how to set up a sequence properly, though - any ideas?

来源:https://stackoverflow.com/questions/47560578/cntk-how-do-i-create-a-sequence-of-tensors-from-a-single-tensor

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!