lstm | 易学教程

LSTM应用汇总

阅读更多关于 LSTM应用汇总

#定义LSTM lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size) #将lstm的状态初始化为全0数组 #state.c和state.h分别对应图中的c状态和h状态 #和其他神经网络一样，在优化循环神经网络时，每次也会使用一个batch的训练样本。 state = lstm.zero_state(batch_size,tf.float32) #定义损失 loss = 0.0 #前馈网络 for i in range(num_steps): #每一步处理时间序列中的一个时刻。将当前输入current_input(Xt)和前一时刻状态 #state(Ht-1和Ct-1)传入定义的LSTM结构可以得到当前LSTM的输出lstm_output(Ht)和 #更新后状态state(Ht和Ct)。lstm_output用于输出给其他层，state用于输出给下一时刻， #它们在dropout等方面可以有不同的处理方式。 lstm_output,state = lstm(current_input,state) #将当前时刻LSTM结构的输出传入一个全连接层得到最后的输出 final_output = fully_connected(lstm_output) #计算当前时刻输出的损失 loss += calc_loss(final_output

Recurrent Neural Networks vs LSTM

阅读更多关于 Recurrent Neural Networks vs LSTM

Recurrent Neural Network RNN擅长处理序列问题。下面我们就来看看RNN的原理。可以这样描述：如上图所述，网络的每一个output都会对应一个memory单元用于存储这一时刻网络的输出值，然后这个memory会作为下一时刻输入的一部分传入RNN,如此循环下去。下面来看一个例子。假设所有神经元的weight都为1，没有bias，所有激励函数都是linear，memory的初始值为0. 输入序列[1,1],[1,1],[2,2].....，来以此计算输出。对输入[1，1]，output为1×1+1×1 + 0×1 = 2->2*1+2*1 = 4,最后输出为[4,4]，然后将[4,4]存入memory单元，作为下一时刻的部分输入。最后得到的输出序列是这样的。而如果每次输入的序列不同，最后的输出序列也会不一样。在RNN中，每次都是使用相同的网络结构，只是每次的输入和memory会不同。这样就使我们在句子分析中，能够辨别同一个词出现在不同位置的时候的不同意思。当然RNN也可以是深层的网络。这里会有两种不同的RNN类型Elman和Jordan。还有双向的RNN，可以兼顾句子的前后部分。 Long Short-term Memory (LSTM) 上面就是一个LSTM的cell的结构，每个cell有4个input，和1个output。

indices = 2 is not in [0, 1)

阅读更多关于 indices = 2 is not in [0, 1)

问题 I'm working on a seq2sql project and I successfully build a model but when training I get an error. I'm not using any Keras embedding layer. M=13 #Question Length d=40 #Dimention of the LSTM C=12 #number of table Columns batch_size=9 inputs1=Input(shape=(M,100),name='question_token') Hq=Bidirectional(LSTM(d,return_sequences=True),name='QuestionENC')(inputs1) #this is HQ shape is (num_samples,13,80) inputs2=Input(shape=(C,3,100),name='col_token') col_lstm_layer=Bidirectional(LSTM(d,return

涨姿势！一文了解深度学习中的注意力机制

阅读更多关于涨姿势！一文了解深度学习中的注意力机制

全文共 11413 字，预计学习时长 33 分钟图源：Unsplash “每隔一段时间，就会出现一种能改变一切的革命性产品。” ——史蒂夫·乔布斯（SteveJobs）这句21世纪最知名的言论之一与深度学习有什么关系呢？想想看。计算能力的提升带来了一系列前所未有的突破。若要追根溯源，答案将指向注意力机制。简而言之，这一全新概念正在改变我们应用深度学习的方式。图源：Unsplash 注意力机制是过去十年中，深度学习研究领域最具价值的突破之一。它催生了包括Transformer架构和Google的BERT在内的自然语言处理（NLP）领域的许多近期突破。如果你目前（或打算）从事NLP相关工作，一定要了解什么是注意力机制及其工作原理。本文会讨论几种注意力机制的基础、流程及其背后的基本假设和直觉，并会给出一些数学公式来完整表达注意力机制，以及能让你在Python中轻松实现注意力相关架构的代码。大纲 l 注意力机制改变了我们应用深度学习算法的方式 l 注意力机制彻底改变了自然语言处理（NLP）甚至计算机视觉等领域 l 本文将介绍注意力机制在深度学习中的工作原理，以及如何用Python将其实现目录 1.什么是注意力？ 1. 深度学习是如何引入注意力机制的 2. 了解注意力机制 2.使用Keras在Python中实现简单的注意力模型 3.全局与局部注意力 4

Tensorflow - LSTM state reuse within batch

阅读更多关于 Tensorflow - LSTM state reuse within batch

问题 I am working on a Tensorflow NN which uses an LSTM to track a parameter (time series data regression problem). A batch of training data contains a batch_size of consecutive observations. I would like to use the LSTM state as input to the next sample. So, if I have a batch of data observations, I would like to feed the state of the first observation as input to the second observation and so on. Below I define the lstm state as a tensor of size = batch_size. I would like to reuse the state

Multiply multiple tensors pairwise keras

阅读更多关于 Multiply multiple tensors pairwise keras

问题 I want to ask if it is possible to multiply two tensors pairwise. So for example, I have tensor output from LSTM layer, lstm=LSTM(128,return_sequences=True)(input) output=some_function()(lstm) some_function() should do h1*h2,h2*h3....hn-1*hn I found How do I take the squared difference of two Keras tensors? little helpful but since, I will have trainable paramter, I will have to make my own layer. Also, will some_function layer interpret input dimension automatically as it will be hn-1 I am

How can I grid search different values for my keras model in python?

阅读更多关于 How can I grid search different values for my keras model in python?

问题 I've implemented a LSTM in keras. In that I am using the following three values: embedding_size hidden_layer_size learning_rate I want now to find the values which fit best into my model. So for example I have 3 values I can assign to each property (like [embedding_size: [100, 150, 200], hidden_layer_size: [50, 100, 150], learning_rate: [0.015,0.01,0.005]] ) What I would love now to know is which combination works best in my function. I thought I can build my function like this: def lstm

How to mask the inputs in an LSTM autoencoder having a RepeatVector() layer?

阅读更多关于 How to mask the inputs in an LSTM autoencoder having a RepeatVector() layer?

问题 I have been trying to obtaining a vector representation of a sequence of vectors using an LSTM autoencoder so that I can classify the sequence using a SVM or other such supervised algorithms. The amount of data is preventing me from using a fully connected dense layer for classification. The shortest size of my input is 7 timesteps and the longest sequence is 356 timesteps. Accordingly, I have padded the shorter sequences with zeros to obtain a final x_train of shape (1326, 356, 8) where 1326

Tensorflow之RNN,LSTM

阅读更多关于 Tensorflow之RNN,LSTM

Tensorflow之RNN,LSTM #!/usr/bin/env python2 # -*- coding: utf-8 -*- """ tensorflow之RNN 循环神经网络做手写数据集分类 """ import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data #设置随机数来比较两种计算结果 tf.set_random_seed(1) #导入手写数据集 mnist = input_data.read_data_sets('MNIST_data', one_hot=True) #设置参数 lr = 0.001 training_iters = 100000 batch_size = 128 n_inputs = 28 # MNIST 输入为图片(img shape: 28*28)对应到图片像素的一行 n_steps = 28 # time steps 对应到图片有多少列 n_hidden_units = 128 # 隐藏层神经元个数 n_classes = 10 # MNIST分类结果为10 #定义权重 weights = { #(28,128) 'in': tf.Variable(tf.random_normal([n_inputs, n_hidden_units]))

Keras LSTM multiple errors from trying to create model architecture

阅读更多关于 Keras LSTM multiple errors from trying to create model architecture

问题 This is a duplicate Question that i posted earlier today, in the other question i was using an old version of Keras. I've upgraded to Keras 2.0.0 and still was getting a lot of errors that i can't figure out on my own so i'm reposting the question mostly verbatim. I am trying to understand how to use keras for supply chain forecasting and i keep getting errors that i can't find help for elsewhere. I've tried to do similar tutorials; sunspot forecasting tutorial, pollution multivariate

订阅 lstm