lstm

What is the intuition of using tanh in LSTM

老子叫甜甜 提交于 2019-11-28 13:45:29
问题 In LSTM Network (Understanding LSTMs), Why input gate and output gate use tanh? what is the intuition behind this? it is just a nonlinear transformation? if it is, can I change both to another activation function (e.g. ReLU)? 回答1: Sigmoid specifically, is used as the gating function for the 3 gates(in, out, forget) in LSTM , since it outputs a value between 0 and 1, it can either let no flow or complete flow of information throughout the gates. On the other hand, to overcome the vanishing

What is the fastest way to prepare data for RNN with numpy?

蹲街弑〆低调 提交于 2019-11-28 13:07:15
I currently have a (1631160,78) np array as my input to a neural network. I would like to try something with LSTM which requires a 3D structure as input data. I'm currently using the following code to generate the 3D structure needed but it is super slow (ETA > 1day). Is there a better way to do this with numpy? My current code to generate data: def transform_for_rnn(input_x, input_y, window_size): output_x = None start_t = time.time() for i in range(len(input_x)): if i > 100 and i % 100 == 0: sys.stdout.write('\rTransform Data: %d/%d\tETA:%s'%(i, len(input_x), str(datetime.timedelta(seconds=

Multivariate LSTM with missing values

杀马特。学长 韩版系。学妹 提交于 2019-11-28 09:59:45
I am working on a Time Series Forecasting problem using LSTM. The input contains several features, so I am using a Multivariate LSTM. The problem is that there are some missing values, for example: Feature 1 Feature 2 ... Feature n 1 2 4 nan 2 5 8 10 3 8 8 5 4 nan 7 7 5 6 nan 12 Instead of interpolating the missing values, that can introduce bias in the results, because sometimes there are a lot of consecutive timestamps with missing values on the same feature, I would like to know if there is a way to let the LSTM learn with the missing values, for example, using a masking layer or something

Initializing LSTM hidden state Tensorflow/Keras

ぐ巨炮叔叔 提交于 2019-11-28 08:43:38
Can someone explain how can I initialize hidden state of LSTM in tensorflow? I am trying to build LSTM recurrent auto-encoder, so after i have that model trained i want to transfer learned hidden state of unsupervised model to hidden state of supervised model. Is that even possible with current API? This is paper I am trying to recreate: http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf Yes - this is possible but truly cumbersome. Let's go through an example. Defining a model: from keras.layers import LSTM, Input from keras.models import Model input = Input(batch_shape=(32

how to reshape text data to be suitable for LSTM model in keras

点点圈 提交于 2019-11-28 08:12:47
问题 Update1: The code Im referring is exactly the code in the book which you can find it here. The only thing is that I don't want to have embed_size in the decoder part. That's why I think I don't need to have embedding layer at all because If I put embedding layer, I need to have embed_size in the decoder part(please correct me if Im wrong). Overall, Im trying to adopt the same code without using the embedding layer, because I need o have vocab_size in the decoder part. I think the suggestion

Predicting a multiple forward time step of a time series using LSTM

安稳与你 提交于 2019-11-28 06:01:35
I want to predict certain values that are weekly predictable (low SNR). I need to predict the whole time series of a year formed by the weeks of the year (52 values - Figure 1) My first idea was to develop a many-to-many LSTM model (Figure 2) using Keras over TensorFlow. I'm training the model with a 52 input layer (the given time series of previous year) and 52 predicted output layer (the time series of next year). The shape of train_X is (X_examples, 52, 1), in other words, X_examples to train, 52 timesteps of 1 feature each. I understand that Keras will consider the 52 inputs as a time

深度学习之长短期记忆网络LSTM理解

五迷三道 提交于 2019-11-28 05:55:15
转自博客 http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ,大致翻译了一下,大神讲的很好。 RNN Networks 人类不会每时每刻都开始思考。当你阅读这篇文章时,你会根据你对之前单词的理解来理解每个单词。你不会扔掉所有东西,然后再从头开始思考。你的思想有持久性。 传统的神经网络无法做到这一点,这似乎是一个主要的缺点。例如, 目前还不清楚传统的神经网络如何利用其对电影中先前事件的推理来为后来的事件提供信息。 循环神经网络解决了这个问题。它们是带有循环的网络,允许信息持续存在。 递归神经网络具有循环。 在上图中,一块神经网络A,接受一些输入 并输出值 。循环允许信息从网络的一个步骤传递到下一个步骤。 这些循环使得循环神经网络看起来有点神秘。但是,如果你多想一点,事实证明它们与普通的神经网络并没有什么不同。可以将循环神经网络视为同一网络的多个副本,每个副本都将消息传递给后继者。考虑如果我们展开循环会发生什么: 展开的递归神经网络。 这种类似链的性质表明,递归神经网络与序列和列表密切相关。它们是用于此类数据的神经网络的自然架构。 他们肯定会被使用!在过去几年中,将RNN应用于各种问题取得了令人难以置信的成功:语音识别,语言建模,翻译,图像字幕......这个列表还在继续。我将讨论使用RNNs可以实现的惊人壮举

How to calculate the number of parameters of an LSTM network?

孤人 提交于 2019-11-28 05:32:39
Is there a way to calculate the total number of parameters in a LSTM network. I have found a example but I'm unsure of how correct this is or If I have understood it correctly. For eg consider the following example:- from keras.models import Sequential from keras.layers import Dense, Dropout, Activation from keras.layers import Embedding from keras.layers import LSTM model = Sequential() model.add(LSTM(256, input_dim=4096, input_length=16)) model.summary() Output ____________________________________________________________________________________________________ Layer (type) Output Shape Param

深度学习调参策略(一)

走远了吗. 提交于 2019-11-28 05:17:24
经常会被问到你用 深度学习 训练模型时怎么样改善你的结果呢?然后每次都懵逼了,一是自己懂的不多,二是实验的不多,三是记性不行忘记了。所以写这篇博客,记录下别人以及自己的一些经验。 Ilya Sutskever(Hinton的学生)讲述了有关深度学习的见解及实用建议: 获取数据:确保要有高质量的输入/输出数据集,这个数据集要足够大、具有代表性以及拥有相对清楚的标签。缺乏数据集是很难成功的。 预处理:将数据进行集中是非常重要的,也就是要使数据均值为0,从而使每个维度的每次变动为1。有时,当输入的维度随量级排序变化时,最好使用那个维度的log(1+x)。基本上,重要的是要找到一个0值的可信编码以及自然分界的维度。这样做可使学习工作得更好。情况就是这样的,因为权值是通过公式来更新的:wij中的变化 \propto xidL/dyj(w表示从层x到层y的权值,L是损失函数)。如果x的均值很大(例如100),那么权值的更新将会非常大,并且是相互关联的,这使得学习变得低劣而缓慢。保持0均值和较小的方差是成功的关键因素。 批处理:在如今的计算机上每次只执行一个训练样本是很低效的。反之如果进行的是128个例子的批处理,效率将大幅提高,因为其输出量是非常可观的。事实上使用数量级为1的批处理效果不错,这不仅可获得性能的提升同时可降低过度拟合;不过这有可能会被大型批处理超越。但不要使用过大的批处理

RNN, LSTM, GRU cells

喜夏-厌秋 提交于 2019-11-28 05:15:21
项目需要,先简记cell,有时间再写具体改进原因 RNN cell LSTM cell: GRU cell: reference: 1、 https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45#50f0 2、 https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21 来源: https://www.cnblogs.com/qinduanyinghua/p/11393581.html