lstm

Keras : How should I prepare input data for RNN?

十年热恋 提交于 2019-11-29 23:17:47
I'm having trouble with preparing input data for RNN on Keras. Currently, my training data dimension is: (6752, 600, 13) 6752: number of training data 600: number of time steps 13: size of feature vectors (the vector is in float) X_train and Y_train are both in this dimension. I want to prepare this data to be fed into SimpleRNN on Keras. Suppose that we're going through time steps, from step #0 to step #599. Let's say I want to use input_length = 5 , which means that I want to use recent 5 inputs. (e.g. step #10, #11,#12,#13,#14 @ step #14). How should I reshape X_train ? should it be (6752,

简单的NER模型实现——CRF+LSTM

僤鯓⒐⒋嵵緔 提交于 2019-11-29 21:25:17
记录从零到实现CRF+LSTM的整个过程 查找概述,了解实现的过程【1h30min】 建立模型【3h30min】 阅读代码,对于实现细节有进一步的了解【4h】 自己手动实现【】 背景知识 NER 神经网络 成为可以有效处理许多NLP任务的模型。这类方法对于序列标注任务(如CWS、POS、NER)的处理方式是类似的,将 token 从离散one-hot表示 映射到低维空间中成为稠密的embedding ,随后将句子的embedding序列 输入到RNN中 ,用神经网络 自动提取特征 , Softmax来预测每个token的标签 。 缺点在于对于token打标签的时候是独立的分类,不能够直接利用上文已经预测的标签。 为了解决这个问题,提出LSTM+CRF模型做序列标注,在LSTM层后接入CRF层来做句子级别的标签预测,使得标注过程不再是对各个token独立分类。 LSTM 信息来源:《Neural Network Methods for NLP》 LSTM,即为Long Short-Term Memory,是目前最成功的 RNN架构 类型之一。 从名字可以看出,其设计的主要目的是 解决RNN的梯度消失问题和梯度爆炸问题 。门机制能够使得与记忆部分相关的梯度保留很长时间。因此,RNN能够在更长的序列中有更好的表现。 主要手段是首次引入 门机制 。所谓的门机制,简单来说,就是利用一个0

PyTorch - contiguous()

北城余情 提交于 2019-11-29 20:16:44
I was going through this example of a LSTM language model on github (link) . What it does in general is pretty clear to me. But I'm still struggling to understand what calling contiguous() does, which occurs several times in the code. For example in line 74/75 of the code input and target sequences of the LSTM are created. Data (stored in ids ) is 2-dimensional where first dimension is the batch size. for i in range(0, ids.size(1) - seq_length, seq_length): # Get batch inputs and targets inputs = Variable(ids[:, i:i+seq_length]) targets = Variable(ids[:, (i+1):(i+1)+seq_length].contiguous())

What is the intuition of using tanh in LSTM

人走茶凉 提交于 2019-11-29 19:31:56
In LSTM Network ( Understanding LSTMs ), Why input gate and output gate use tanh? what is the intuition behind this? it is just a nonlinear transformation? if it is, can I change both to another activation function (e.g. ReLU)? Sigmoid specifically, is used as the gating function for the 3 gates(in, out, forget) in LSTM , since it outputs a value between 0 and 1, it can either let no flow or complete flow of information throughout the gates. On the other hand, to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to

How to use return_sequences option and TimeDistributed layer in Keras?

限于喜欢 提交于 2019-11-29 19:26:58
I have a dialog corpus like below. And I want to implement a LSTM model which predicts a system action. The system action is described as a bit vector. And a user input is calculated as a word-embedding which is also a bit vector. t1: user: "Do you know an apple?", system: "no"(action=2) t2: user: "xxxxxx", system: "yyyy" (action=0) t3: user: "aaaaaa", system: "bbbb" (action=5) So what I want to realize is "many to many (2)" model. When my model receives a user input, it must output a system action. But I cannot understand return_sequences option and TimeDistributed layer after LSTM. To

pytorch的LSTM笔记

柔情痞子 提交于 2019-11-29 19:04:11
pytorch 的LSTM初始化时的句子长度不是固定的,是可以动态调整的,只是作为batch训练时,需要保证句子的长度是统一的。 keras初始化模型是必须传入句子长度,也就是lstm的单元数,这个是模型参数的一部分 lstm的参数在于神经元内部的参数,句子长度并不是lstm的参数 来源: https://www.cnblogs.com/rise0111/p/11527323.html

What is the correct procedure to split the Data sets for classification problem?

坚强是说给别人听的谎言 提交于 2019-11-29 18:07:26
I am new to Machine Learning & Deep Learning. I would like to clarify my doubt related to train_test_split before training I have a data set of size (302, 100, 5) , where, (207,100,5) belongs to class 0 (95,100,5) belongs to class 1. I would like to perform Classification using LSTM (since, sequence Data) How can i split my data set for training, since the classes do not have equal distribution sets ? Option 1 : Consider whole data [(302,100, 5) - both classes (0 & 1)] , shuffle it, train_test_split, proceed training. Option 2 : Split both class data set equally [(95,100,5) - class 0 & (95,100

Cyclic computational graphs with Tensorflow or Theano

非 Y 不嫁゛ 提交于 2019-11-29 15:44:13
Both TensorFlow and Theano do not seem to support cyclic computational graphs, cyclic elements are implemented as recurrent cells with buffer and unrolling (RNN / LSTM cells), but this limitation is mostly related with the computation of back-propagation. I don't have a particular need for computing back-propagation but just the forward propagations. Is there a way to ignore this limitation, or perhaps just to break down arbitrary computational graphs in acyclic components? TensorFlow does support cyclic computation graphs. The tf.while_loop() function allows you to specify a while loop with

how to reshape text data to be suitable for LSTM model in keras

僤鯓⒐⒋嵵緔 提交于 2019-11-29 14:16:45
Update1: The code Im referring is exactly the code in the book which you can find it here . The only thing is that I don't want to have embed_size in the decoder part. That's why I think I don't need to have embedding layer at all because If I put embedding layer, I need to have embed_size in the decoder part(please correct me if Im wrong). Overall, Im trying to adopt the same code without using the embedding layer, because I need o have vocab_size in the decoder part. I think the suggestion provided in the comment could be correct ( using one_hot_encoding ) how ever I faced with this error:

How do you pass video features from a CNN to an LSTM?

a 夏天 提交于 2019-11-29 14:08:25
问题 After you pass a video frame through a convnet and get an output feature map, how do you pass that data into an LSTM? Also, how do you pass multiple frames to the LSTM thru the CNN? In other works I want to process video frames with an CNN to get the spatial features. Then I want pass these features to an LSTM to do temporal processing on the spatial features. How do I connect the LSTM to the video features? For example if the input video is 56x56 and then when passed through all of the CNN