lstm | 易学教程

In Keras, what exactly am I configuring when I create a stateful `LSTM` layer with N `units`?

阅读更多关于 In Keras, what exactly am I configuring when I create a stateful `LSTM` layer with N `units`?

The first arguments in a normal Dense layer is also units , and is the number of neurons/nodes in that layer. A standard LSTM unit however looks like the following: (This is a reworked version of " Understanding LSTM Networks ") In Keras, when I create an LSTM object like this LSTM(units=N, ...) , am I actually creating N of these LSTM units? Or is it the size of the "Neural Network" layers inside the LSTM unit, i.e., the W 's in the formulas? Or is it something else? For context, I'm working based on this example code . The following is the documentation: https://keras.io/layers/recurrent/ It

How to deal with batches with variable-length sequences in TensorFlow?

阅读更多关于 How to deal with batches with variable-length sequences in TensorFlow?

I was trying to use an RNN (specifically, LSTM) for sequence prediction. However, I ran into an issue with variable sequence lengths. For example, sent_1 = "I am flying to Dubain" sent_2 = "I was traveling from US to Dubai" I am trying to predicting the next word after the current one with a simple RNN based on this Benchmark for building a PTB LSTM model . However, the num_steps parameter (used for unrolling to the previous hidden states), should remain the same in each Tensorflow's epoch. Basically, batching sentences is not possible as the sentences vary in length. # inputs = [tf.squeeze

How do I create a variable-length input LSTM in Keras?

阅读更多关于 How do I create a variable-length input LSTM in Keras?

I am trying to do some vanilla pattern recognition with an LSTM using Keras to predict the next element in a sequence. My data look like this: where the label of the training sequence is the last element in the list: X_train['Sequence'][n][-1] . Because my Sequence column can have a variable number of elements in the sequence, I believe an RNN to be the best model to use. Below is my attempt to build an LSTM in Keras: # Build the model # A few arbitrary constants... max_features = 20000 out_size = 128 # The max length should be the length of the longest sequence (minus one to account for the

What is num_units in tensorflow BasicLSTMCell?

阅读更多关于 What is num_units in tensorflow BasicLSTMCell?

In MNIST LSTM examples, I don't understand what "hidden layer" means. Is it the imaginary-layer formed when you represent an unrolled RNN over time? Why is the num_units = 128 in most cases ? I know I should read colah's blog in detail to understand this, but, before that, I just want to get some code working with a sample time series data I have. nobar The number of hidden units is a direct representation of the learning capacity of a neural network -- it reflects the number of learned parameters . The value 128 was likely selected arbitrarily or empirically. You can change that value

keras中文文档笔记17——将Keras作为tensorflow的精简接口

阅读更多关于 keras中文文档笔记17——将Keras作为tensorflow的精简接口

将Keras作为tensorflow的精简接口在tensorflow中调用Keras层让我们以一个简单的例子开始：MNIST数字分类。我们将以Keras的全连接层堆叠构造一个TensorFlow的分类器， import tensorflow as tf sess = tf.Session() from keras import backend as K K.set_session(sess) 然后，我们开始用tensorflow构建模型： # this placeholder will contain our input digits, as flat vectors img = tf.placeholder(tf.float32, shape=( None , 784 )) 用Keras可以加速模型的定义过程： from keras.layers import Dense # Keras layers can be called on TensorFlow tensors: x = Dense( 128 , activation= 'relu' )(img) # fully-connected layer with 128 units and ReLU activation x = Dense( 128 , activation= 'relu' )(x) preds =

TypeError: can't pickle _thread.lock objects in Seq2Seq

阅读更多关于 TypeError: can't pickle _thread.lock objects in Seq2Seq

I'm having trouble using buckets in my Tensorflow model. When I run it with buckets = [(100, 100)] , it works fine. When I run it with buckets = [(100, 100), (200, 200)] it doesn't work at all (stacktrace at bottom). Interestingly, running Tensorflow's Seq2Seq tutorial gives the same kind of issue with a nearly identical stacktrace. For testing purposes, the link to the repository is here . I'm not sure what the issue is, but having more than one bucket always seems to trigger it. This code won't work as a standalone, but this is the function where it is crashing - remember that changing

机器翻译中的深度学习技术：CNN，Seq2Seq，SGAN，Dual Learning

阅读更多关于机器翻译中的深度学习技术：CNN，Seq2Seq，SGAN，Dual Learning

机器翻译是深度学习技术与 NLP 结合使用最活跃的，最充满希望的一个方向。从最初完全基于靠人编纂的规则的机器翻译方法，到后来基于统计的 SMT 方法，再到现在神经机器翻译 NMT ，机器翻译技术在过去 60 多年的时间里一直不断的更新，特别是在 2012 深度学习技术进入人们视野之后，机器翻译的准确率不断刷新，今天就主要盘点一下各类深度学习机器翻译里面的应用现状，给出一些比较有代表性的论文学习一下。基于深度学习技术的神经翻译技术（ NMT ）最大的有点久在于： 1 、采用一种端到端（ end -to-end ）的结构，不在需要人为的去抽取特征； 2 、网络结构设计简单，不需要进行词语切分、词语对齐、句法树设计等复杂的设计工作。同时，这一方法的缺点也很明显： 1 、可解释性差，以 seq2seq 为例，很难解释和理解隐层输出，即 encoder 输出具体的物理意义； 2 、训练复杂度高，耗时耗力。深度学习训练样本量往往多大亿计，训练一个模型需要专门得 GPU 集群，花费几天甚至一周时间才能得出一个结果，模型的迭代更新非常慢。虽然 NMT 有它的不足，但总的来说还是利远远大于弊，下面就主要讲讲深度学习在机器翻译得一些具体应用： A

论文阅读：social lstm：Human Trajectory Prediction in Crowded Spaces

阅读更多关于论文阅读：social lstm：Human Trajectory Prediction in Crowded Spaces

社会LSTM：拥挤空间中的人类轨迹预测学习笔记参考：study note: https://www.zybuluo.com/ArrowLLL/note/981714 摘要：行人遵循不同的轨迹以避开障碍物并容纳行人。任何导航这样一个场景的自动驾驶车辆都应该能够预见到行人的未来位置并相应地调整其路径以避免碰撞。轨迹预测的这个问题可以被视为序列生成任务，我们有兴趣根据他们过去的位置预测人的未来轨迹。在最近成功的用于序列预测任务的递归神经网络（RNN）模型之后，我们提出了一种LSTM模型，该模型可以学习一般人类运动并预测他们未来的轨迹。这与使用社会力量等手工制作功能的传统方法形成对比。我们在几个公共数据集上演示了我们方法的性能。我们的模型在某些数据集上优于最先进的方法。我们还分析了我们的模型预测的轨迹，以展示我们的模型所学习的运动行为。传统方法的限制：i）他们使用手工制作的功能来为特定设置建模“交互”，而不是以数据驱动的方式推断它们。这导致有利于捕获简单交互（例如排斥/吸引力）的模型，并且可能无法推广更复杂的拥挤设置。ii）他们专注于建立彼此非常接近的人之间的互动（以避免直接碰撞）。但是，他们预计不会在更遥远的未来发生相互作用。我们还分析了模型生成的轨迹模式，以了解从轨迹数据集中学到的社会约束。 3.1Social LSTM 每个人都有不同的运动模式：它们以不同的速度

【NLP】彻底搞懂BERT

阅读更多关于【NLP】彻底搞懂BERT

# 好久没更新博客了，有时候随手在本上写写，或者Evernote上记记，零零散散的笔记带来零零散散的记忆o(╥﹏╥)o。。还是整理到博客上比较有整体性，也方便查阅~ 自google在2018年10月底公布BERT在11项nlp任务中的卓越表现后，BERT（Bidirectional Encoder Representation from Transformers)就成为NLP领域大火、整个ML界略有耳闻的模型，网上相关介绍也很多，但很多技术内容太少，或是写的不全面半懂不懂，重复内容占绝大多数（这里弱弱吐槽百度的搜索结果多样化。。）一句话概括，BERT的出现，彻底改变了预训练产生词向量和下游具体NLP任务的关系，提出龙骨级的训练词向量概念。目录：　　词向量模型：word2vec, ELMo, BERT比较　　BERT细则：Masked LM, Transformer, sentence-level 　　迁移策略：下游NLP任务调用接口　　运行结果：破11项NLP任务最优纪录一、词向量模型这里主要横向比较一下word2vec，ELMo，BERT这三个模型，着眼在模型亮点与差别处。传统意义上来讲，词向量模型是一个工具，可以把真实世界抽象存在的文字转换成可以进行数学公式操作的向量，而对这些向量的操作，才是NLP真正要做的任务。因而某种意义上，NLP任务分成两部分，

Predicting the next word using the LSTM ptb model tensorflow example

阅读更多关于 Predicting the next word using the LSTM ptb model tensorflow example

I am trying to use the tensorflow LSTM model to make next word predictions. As described in this related question (which has no accepted answer) the example contains pseudocode to extract next word probabilities: lstm = rnn_cell.BasicLSTMCell(lstm_size) # Initial state of the LSTM memory. state = tf.zeros([batch_size, lstm.state_size]) loss = 0.0 for current_batch_of_words in words_in_dataset: # The value of state is updated after processing each batch of words. output, state = lstm(current_batch_of_words, state) # The LSTM output can be used to make next word predictions logits = tf.matmul

订阅 lstm