lstm

Input Shape Error in Second-layer (but not first) of Keras LSTM

空扰寡人 提交于 2019-11-29 12:38:40
EDITED for conciseness. I am trying to build an LSTM model, working off the documentation example at https://keras.io/layers/recurrent/ from keras.models import Sequential from keras.layers import LSTM The following three lines of code (plus comment) are taken directly from the documentation link above: model = Sequential() model.add(LSTM(32, input_dim=64, input_length=10)) # for subsequent layers, not need to specify the input size: model.add(LSTM(16)) ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=2 I get that error above after executing the second model

Keras LSTM predicted timeseries squashed and shifted

£可爱£侵袭症+ 提交于 2019-11-29 09:26:37
I'm trying to get some hands on experience with Keras during the holidays, and I thought I'd start out with the textbook example of timeseries prediction on stock data. So what I'm trying to do is given the last 48 hours worth of average price changes (percent since previous), predict what the average price chanege of the coming hour is. However, when verifying against the test set (or even the training set) the amplitude of the predicted series is way off, and sometimes is shifted to be either always positive or always negative, i.e., shifted away from the 0% change, which I think would be

Multilayer Seq2Seq model with LSTM in Keras

六眼飞鱼酱① 提交于 2019-11-29 09:14:31
问题 I was making a seq2seq model in keras. I had built single layer encoder and decoder and they were working fine. But now I want to extend it to multi layer encoder and decoder. I am building it using Keras Functional API. Training:- Code for encoder:- encoder_input=Input(shape=(None,vec_dimension)) encoder_lstm=LSTM(vec_dimension,return_state=True,return_sequences=True)(encoder_input) encoder_lstm=LSTM(vec_dimension,return_state=True)(encoder_lstm) encoder_output,encoder_h,encoder_c=encoder

李宏毅深度学习视频摘要

痴心易碎 提交于 2019-11-29 08:54:41
视频地址 李宏毅深度学习(nlp)2017 视频摘要 P1 讲了RNN,LSTM ,GRU网络构造 P2 讲了卷积的原理,pooling的原理,已经不太常规的poolling方法。另外提到一种特殊的Rnn结构stackRNN P3 讲了深度学习反向传播的知识,其中提到链式法则,fc网络的bp方法和RNN的bp方法 P4 讲语言模型 n-gram : P(a|b)直接统计语料库的概率 nn-based-LM: P(a|b)由网络计算出来,其原理是matrix-factorization RNN-based-LM: P(a|b,c)由网络计算出,RNN可以用前一次计算的隐状态,相比传统方法计算P(a|b,c)参数计算量更少。 P5 Spatial Transfromer Layer 主要思路是在图像进入CNN之前,先经过 旋转 和 缩放 操作,然后在进行识别。主要解决了cnn网络无法识别经过 旋转 和 缩放 的图像。(感觉好像random corp 也能部分解决) 该转换层仅仅需要6个参数,对图像的左边进行转换: [ x b e f o r e y b e f o r e ] ∗ [ w 1 w 2 w 3 w 4 ] + [ b 1 b 2 ] = [ x a f t e r y a f t e r ] \begin{bmatrix}x_{before}\\ y_{before}\\

tensorflow2.0 中lstm的实现

那年仲夏 提交于 2019-11-29 08:28:05
tensorflow2.0 中lstm的实现 tensorflow2.0 中lstm的实现 from __future__ import absolute_import, division, print_function, unicode_literals import collections import matplotlib.pyplot as plt import numpy as np try: # %tensorflow_version only exists in Colab. %tensorflow_version 2.x except Exception: pass import tensorflow as tf from tensorflow.keras import layers 设计模型 batch_size = 128 # 大小可变,一次处理多少个样本 # Each MNIST image batch is a tensor of shape (batch_size, 28, 28). # Each input sequence will be of size (28, 28) (height is treated like time). input_dim = 28 # 必须和特征的列数目相等 units = 64 # LSTM 中 output_size

RNN

断了今生、忘了曾经 提交于 2019-11-29 05:00:56
目录 1. 为什么需要RNN 2. LSTM的结构 3. LSTM网络 4. RNN 的评估 5. RNN的应用 6. Attention-based model 1. 为什么需要RNN? 传统的神经网络,一个输入会对应一个输出,如果输入不变,那输出也不会变。如下,一个Tappei是属于目的地类的,但换一句话又属于出发地类。这个时候就需要神经网络具有记忆功能。 实际上,循环神经网络是一类神经网络,一类具有记忆功能的神经网络。一类把上一个训练的结果带到下一次训练的神经网络 这就是一个简单的RNN,它有一个隐层,这个隐层的输出会被带到下一次训练中,和下一次训练的输入数据共同作为网络的输入 这是一个双向的RNN,这样的好处是不仅可以看到上文,还可以看到下文 2. LSTM的结构 下面,LSTM隆重登场! LSTM是一种RNN, 实际上,现在大家讲在做RNN的时候,其实他们指的就是在做 LSTM。 LSTM已经成为了一种标准。 这是LSTM的一个Memory Cell, 也就是一个单元: LSTM的一个memory cell 一共有4个输入,一个输出。 这种网络结构的好处就是 可以控制一个词是否应该输入,上一个词的信息是否应该被遗忘以及是否应该输出 这是一个LSTM的栗子: //篇幅原因,还有几幅图就不展示了,可以前往李老师的RNN part I 的ppt里面查看。 3. LSTM网络

LSTM Keras API predicting multiple outputs

安稳与你 提交于 2019-11-29 03:59:48
问题 I'm training an LSTM model using as input a sequence of 50 steps of 3 different features laid out as below: #x_train [[[a0,b0,c0],.....[a49,b49,c49]], [a1,b1,c1]......[a50,b50,c50]], ... [a49,b49,c49]...[a99,b99,c99]]] Using the following dependent variable #y_train [a50, a51, a52, ... a99] The code below works to predict just a, how do I get it to predict and return a vector of [a,b,c] at a given timestep? def build_model(): model = Sequential() model.add(LSTM( input_shape=(50,3), return

Stateful LSTM: When to reset states?

岁酱吖の 提交于 2019-11-29 02:19:42
Given X with dimensions (m samples, n sequences, and k features) , and y labels with dimensions (m samples, 0/1) : Suppose I want to train a stateful LSTM (going by keras definition, where "stateful = True" means that cell states are not reset between sequences per sample -- please correct me if I'm wrong!), are states supposed to be reset on a per epoch basis or per sample basis? Example: for e in epoch: for m in X.shape[0]: #for each sample for n in X.shape[1]: #for each sequence #train_on_batch for model... #model.reset_states() (1) I believe this is 'stateful = False'? #model.reset_states(

How to interpret weights in a LSTM layer in Keras [closed]

心已入冬 提交于 2019-11-28 23:40:33
I'm currently training a recurrent neural network for weather forecasting, using a LSTM layer. The network itself is pretty simple and looks roughly like this: model = Sequential() model.add(LSTM(hidden_neurons, input_shape=(time_steps, feature_count), return_sequences=False)) model.add(Dense(feature_count)) model.add(Activation("linear")) The weights of the LSTM layer do have the following shapes: for weight in model.get_weights(): # weights from Dense layer omitted print(weight.shape) > (feature_count, hidden_neurons) > (hidden_neurons, hidden_neurons) > (hidden_neurons,) > (feature_count,

神经网络梯度爆炸、消失问题、门控循环单元GRU、长短期记忆LSTM

一世执手 提交于 2019-11-28 23:08:39
nn:w比1大,会造成激活函数、梯度爆炸。w比1小,梯度会消失。随机化初始权重有助于解决这个问题。 RNN同样有梯度消失问题,反向传播时后面的梯度很难传到前面的层,从而影响到前面的层的计算。梯度爆炸会发生数值溢出,可以通过修剪、缩放来解决。 GRU:记忆细胞C <t> 的作用是提供记忆的能力。候选值C ^<t> 重写记忆细胞。更新门Γ u 是一个介于0和1之间的数,决定是否更新记忆细胞。相关Γ r 代表候选值和记忆细胞的相关性。 LSTM:更新门,遗忘门,输出门。更新门和遗忘门给了记忆细胞选择权去维持旧的值和更新新的值。偷窥孔连接其实就是三个门值不仅取决于a <t-1> ,x <t> ,还取决于c <t-1> . GRU简单点,适合创建大型结构,计算性能快。LSTM更灵活,大多数人还是会选择LSTM。 来源: https://www.cnblogs.com/biwangwang/p/11432803.html