lstm

Initializing LSTM hidden state Tensorflow/Keras

◇◆丶佛笑我妖孽 提交于 2019-11-27 02:22:44
问题 Can someone explain how can I initialize hidden state of LSTM in tensorflow? I am trying to build LSTM recurrent auto-encoder, so after i have that model trained i want to transfer learned hidden state of unsupervised model to hidden state of supervised model. Is that even possible with current API? This is paper I am trying to recreate: http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf 回答1: Yes - this is possible but truly cumbersome. Let's go through an example.

深度学习文字识别

浪尽此生 提交于 2019-11-27 01:03:38
Blog : https://blog.csdn.net/implok/article/details/95041472 步骤 : 文字识别是AI的一个重要应用场景,文字识别过程一般由图像输入、预处理、文本检测、文本识别、结果输出等环节组成。 分类 :文字识别可根据待识别的文字特点采用不同的识别方法,一般分为定长文字、不定长文字两大类别。 定长文字(例如验证码),由于字符数量固定,采用的网络结构相对简单,识别也比较容易; 不定长文字(例如印刷文字、广告牌文字等),由于字符数量是不固定的,因此需要采用比较复杂的网络结构和后处理环节,识别也具有一定的难度。 一、定长文字识别 定长文字的识别相对简单,应用场景也比较局限,最典型的场景就是验证码的识别。由于字符数量是已知的、固定的,因此,网络结构比较简单,一般构建3层卷积层,2层全连接层便能满足“定长文字”的识别。 二、不定长文字识别 不定长文字在现实中大量存在,例如印刷文字、广告牌文字等,由于字符数量不固定、不可预知,因此,识别的难度也较大,这也是目前研究文字识别的主要方向。下面介绍不定长文字识别的常用方法:LSTM+CTC、CRNN、chinsesocr。 1、LSTM+CTC 方法 (1)什么是LSTM 为了实现对不定长文字的识别,就需要有一种能力更强的模型,该模型具有一定的记忆能力,能够按时序依次处理任意长度的信息,这种模型就是

When does keras reset an LSTM state?

僤鯓⒐⒋嵵緔 提交于 2019-11-27 00:57:54
I read all sorts of texts about it, and none seem to answer this very basic question. It's always ambiguous: In a stateful = False LSTM layer, does keras reset states after: Each sequence; or Each batch? Suppose I have X_train shaped as (1000,20,1), meaning 1000 sequences of 20 steps of a single value. If I make: model.fit(X_train, y_train, batch_size=200, nb_epoch=15) Will it reset states for every single sequence (resets states 1000 times)? Or will it reset states for every batch (resets states 5 times)? Cheking with some tests, I got to the following conclusion, which is according to the

In Keras, what exactly am I configuring when I create a stateful `LSTM` layer with N `units`?

痞子三分冷 提交于 2019-11-27 00:09:19
问题 The first arguments in a normal Dense layer is also units , and is the number of neurons/nodes in that layer. A standard LSTM unit however looks like the following: (This is a reworked version of "Understanding LSTM Networks") In Keras, when I create an LSTM object like this LSTM(units=N, ...) , am I actually creating N of these LSTM units? Or is it the size of the "Neural Network" layers inside the LSTM unit, i.e., the W 's in the formulas? Or is it something else? For context, I'm working

How to deal with batches with variable-length sequences in TensorFlow?

做~自己de王妃 提交于 2019-11-27 00:02:22
问题 I was trying to use an RNN (specifically, LSTM) for sequence prediction. However, I ran into an issue with variable sequence lengths. For example, sent_1 = "I am flying to Dubain" sent_2 = "I was traveling from US to Dubai" I am trying to predicting the next word after the current one with a simple RNN based on this Benchmark for building a PTB LSTM model. However, the num_steps parameter (used for unrolling to the previous hidden states), should remain the same in each Tensorflow's epoch.

TypeError: can't pickle _thread.lock objects in Seq2Seq

天大地大妈咪最大 提交于 2019-11-26 23:20:56
问题 I'm having trouble using buckets in my Tensorflow model. When I run it with buckets = [(100, 100)] , it works fine. When I run it with buckets = [(100, 100), (200, 200)] it doesn't work at all (stacktrace at bottom). Interestingly, running Tensorflow's Seq2Seq tutorial gives the same kind of issue with a nearly identical stacktrace. For testing purposes, the link to the repository is here. I'm not sure what the issue is, but having more than one bucket always seems to trigger it. This code

Tensorflow : Lstm Hidden states vs. final output in Tensorflow's bidirectional LSTM

ε祈祈猫儿з 提交于 2019-11-26 18:39:18
问题 I am trying an experiment on last state vs final output of LSTM cell, Here is minimal code : When I am using last state of lstm outputs : with tf.variable_scope('forward'): fr_cell = tf.contrib.rnn.LSTMCell(num_units = 100) dropout_fr = tf.contrib.rnn.DropoutWrapper(fr_cell, output_keep_prob = 1. - 0.3) with tf.variable_scope('backward'): bw_cell = tf.contrib.rnn.LSTMCell(num_units = 100) dropout_bw = tf.contrib.rnn.DropoutWrapper(bw_cell, output_keep_prob = 1. - 0.3) with tf.variable_scope(

How to apply gradient clipping in TensorFlow?

瘦欲@ 提交于 2019-11-26 18:17:16
Considering the example code . I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients. tf.clip_by_value(t, clip_value_min, clip_value_max, name=None) This is an example that could be used but where do I introduce this ? In the def of RNN lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0) # Split data because rnn cell needs a list of inputs for the RNN inner loop _X = tf.split(0, n_steps, _X) # n_steps tf.clip_by_value(_X, -1, 1, name=None) But this doesn't make sense as the tensor _X is the input and not the

Error when checking model input: expected lstm_1_input to have 3 dimensions, but got array with shape (339732, 29)

对着背影说爱祢 提交于 2019-11-26 18:16:23
问题 My input is simply a csv file with 339732 rows and two columns : the first being 29 feature values, i.e. X the second being a binary label value, i.e. Y I am trying to train my data on a stacked LSTM model: data_dim = 29 timesteps = 8 num_classes = 2 model = Sequential() model.add(LSTM(30, return_sequences=True, input_shape=(timesteps, data_dim))) # returns a sequence of vectors of dimension 30 model.add(LSTM(30, return_sequences=True)) # returns a sequence of vectors of dimension 30 model

LSTM

与世无争的帅哥 提交于 2019-11-26 17:07:35
循环神经网络在网络中引入了定性循环,使得信号从一个神经元传递到下一个神经元并不会马上消失,而是继续存活,隐藏层的输入不仅包括上一层的输出,还包括上一时刻该隐藏层的输出。 RNN引入了隐状态h(hidden state)的概念,h可以对序列形的数据提取特征,接着再转换为输出。每步的参数都是共享的 在经典的RNN结构中,通常使用tanh作为激活函数。 存在以下结构: N vs 1 RNN结构和1 vs N RNN结构。 RNN的公式:h_t=f(Ux_t+Wh_(t-1)+b) RNN很难处理长程依赖问题,即无法学到序列中蕴含的间隔时间较长的规律。 循环神经网络的发展有两个方向:一是增加隐藏层的功能,如simple RNN,GRU,LSTM,CW-RNN;另外一个是双向化及加深网络,如Bidirectional RNN和Reep Bidirectionsal RNN;两个结合引申出DBLSTM. LSTM规避了标准RNN中的梯度爆炸和梯度消失的问题。 LSTM的隐状态有两部分:一部分是ht,一部分是Ct,Ct是在各个步骤间传递的主要信息,长程传播。Ct在Ct-1的基础上遗忘和记住一些内容。 经典英文讲解:Understanding LSTM Networks LSTM:https://colah.github.io/posts/2015-08-Understanding-LSTMs/