lstm

Strange loss curve while training LSTM with Keras

岁酱吖の 提交于 2019-12-05 13:54:56
I'm trying to train an LSTM for some a binary classification problem. When I plot loss curve after the training, there are strange picks in it. Here are some examples: Here is the basic code model = Sequential() model.add(recurrent.LSTM(128, input_shape = (columnCount,1), return_sequences=True)) model.add(Dropout(0.5)) model.add(recurrent.LSTM(128, return_sequences=False)) model.add(Dropout(0.5)) model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) new_train = X_train[..., newaxis] history = model.fit(new_train,

TensorFlow LSTM Generative Model

心已入冬 提交于 2019-12-05 12:53:33
I'm working off the LSTM language model tutorial discussed here . With language models, it's common to use the model to generate a new sentence from scratch after training (i.e. sample from the model). I'm new to TensorFlow but I'm trying to use my trained model to generate new words until the end-of-sentence marker. My initial attempt: x = tf.zeros_like(m.input_data) state = m.initial_state.eval() for step in xrange(m.num_steps): state = session.run(m.final_state, {m.input_data: x, m.initial_state: state}) x = state It fails with error: ValueError: setting an array element with a sequence.

LSTM和GRU

…衆ロ難τιáo~ 提交于 2019-12-05 11:42:15
LSTM 输入门 \(i_t\) :控制当前计算的新状态以多大的程度更新到记忆单元中 遗忘门 \(f_t\) :控制前一步记忆单元中的信息有多大程度被遗忘掉 输出门 \(o_t\) :控制当前的输出有多大程度取决于当前的记忆单元 记忆单元 \(c_t\) :每个单元都有 更新公式 输入门: \[i_t=\sigma(W_ix_t + U_i h_{t-1} + b_i)\] 遗忘门: \[f_t=\sigma(W_fx_t + U_f h_{t-1} + b_f)\] 输出门: \[i_t=\sigma(W_ox_t + U_o h_{t-1} + b_o)\] 记忆单元 \[\tilde{c}_t=\tanh(W_c x_t + U_c h_{t-1})\] 记忆单元更新: \[c_t=f_t \odot c_{t-1} + i_t \odot \tilde{c}_t\] 隐含层输出更新 \[h_t=o_t \odot \tanh(c_t)\] 遗忘门和输入门控制着长短时记忆 更容易学习到序列之间的长期依赖 激活函数 使用ReLU的话,难以实现门控效果 ReLU负半轴是关的,正半轴不具有门控意义 在门控中,使用Sigmoid函数几乎是所有现代神经网络模块的共同选择 计算能力有限设备,使用0/1门(hard gate) GRU 更新门 \(z_t\)

How to merge two LSTM layers in Keras

廉价感情. 提交于 2019-12-05 10:57:29
I’m working with Keras on a sentence similarity task (using the STS dataset) and am having problems merging the layers. The data consists of 1184 sentence pairs each scored between 0 and 5. Below are the shapes of my numpy arrays. I’ve padded each of the sentences to 50 words and run them through and embedding layer, using the glove embedding’s with 100 dimensions. When merging the two networks I'm getting an error.. Exception: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead

How to generate a sentence from feature vector or words?

二次信任 提交于 2019-12-05 10:17:27
I used VGG 16-Layer Caffe model for image captions and I have several captions per image. Now, I want to generate a sentence from those captions (words). I read in a paper on LSTM that I should remove the SoftMax layer from the training network and provide the 4096 feature vector from fc7 layer directly to LSTM. I am new to LSTM and RNN stuff. Where should I begin? Is there any tutorial showing how to generate sentence by sequence labeling? AFAIK the master branch of BVLC/caffe does not yet support a recurrent layer architecture. You should pull branch recurrent from jeffdonahue/caffe . This

深度学习-CNN+RNN笔记

[亡魂溺海] 提交于 2019-12-05 09:51:52
以下叙述只是简单的叙述,CNN+RNN(LSTM,GRU)的应用相关文章还很多,而且研究的方向不仅仅是下文提到的1. CNN 特征提取,用于RNN语句生成图片标注。2. RNN特征提取用于CNN内容分类视频分类。3. CNN特征提取用于对话问答图片问答。还有很多领域,比如根据面目表情判断情感,用于遥感地图的标注,用于生物医学的图像解析,用于安全领域的防火实时监控等。而且现阶段关于CNN+RNN的研究应用相关文章更加多样,效果越来越好,我们可以通过谷歌学术参阅这些文章,而且大部分可免费下载阅读,至于付费的那就另说咯。 CNN与RNN对比 CNN卷积神经网络与RNN递归神经网络直观图 相同点:   传统神经网络的扩展。   前向计算产生结果,反向计算模型更新。   每层神经网络横向可以多个神经元共存,纵向可以有多层神经网络连接。 不同点:   CNN空间扩展,神经元与特征卷积;RNN时间扩展,神经元与多个时间输出计算   RNN可以用于描述时间上连续状态的输出,有记忆功能,CNN用于静态输出   CNN高级100+深度,RNN深度有限 CNN+RNN组合方式 1. CNN 特征提取,用于RNN语句生成图片标注。 2. RNN特征提取用于CNN内容分类视频分类。 3. CNN特征提取用于对话问答图片问答。 具体应用 1. 图片标注 基本思路: 目标是产生标注的语句,是一个语句生成的任务

Dimension Mismatch in LSTM Keras

无人久伴 提交于 2019-12-05 09:39:08
I want to create a basic RNN that can add two bytes. Here are the input and outputs, which are expected of a simple addition X = [[0, 0], [0, 1], [1, 1], [0, 1], [1, 0], [1, 0], [1, 1], [1, 0]] That is, X1 = 00101111 and X2 = 01110010 Y = [1, 0, 1, 0, 0, 0, 0, 1] I created the following sequential model model = Sequential() model.add(GRU(output_dim = 16, input_length = 2, input_dim = 8)) model.add(Activation('relu'`)) model.add(Dense(2, activation='softmax')) model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy']) model.summary() The error I get is something

LSTM&GRU原理及pytroch实现

◇◆丶佛笑我妖孽 提交于 2019-12-05 09:04:11
1.LSTM&GRU的原理 https://blog.csdn.net/jerr__y/article/details/58598296 https://github.com/starflyyy/Gated-Recurrent-Unit-GRU 2.多层LSTM pytorch里有一个num_layers,是指参数共享之后网络也有不同cell,即相当于隐含层的数目,是指cell串联和mlp很像,即为 StackedRNN 具体如下图 3.双向RNN 待继续了解 https://blog.csdn.net/jojozhangju/article/details/51982254 4.lstm&rnn的实现和改造 源码分析 https://zhuanlan.zhihu.com/p/32103001 ,可以加上自己改写RNNcell https://github.com/huyingxi/new-LSTM-Cell 一种实现,但其实现的多层lstm不太对 https://github.com/emadRad/lstm-gru-pytorch/ 来源: https://www.cnblogs.com/yutingmoran/p/11917894.html

Bi-directional LSTM for variable-length sequence in Tensorflow

▼魔方 西西 提交于 2019-12-05 07:46:05
I want to train a bi-directional LSTM in tensorflow to perform a sequence classification problem (sentiment classification). Because sequences are of variable lengths, batches are normally padded with vectors of zero. Normally, I use the sequence_length parameter in the uni-directional RNN to avoid training on the padding vectors. How can this be managed with bi-directional LSTM. Does the "sequence_length" parameter work automatically starts from an advanced position in the sequence for the backward direction? Thank you bidirectional_dynamic_rnn also has a sequence_length parameter that takes

ctc_loss error “No valid path found.”

梦想的初衷 提交于 2019-12-05 06:38:34
Training a model with tf.nn.ctc_loss produces an error every time the train op is run: tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found. Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op. The model is a CNN -> LSTM -> CTC. Here is the model creation code: # Build Graph self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32) self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32) self.keep_prob = tf.placeholder(dtype=tf