lstm

In Keras, what exactly am I configuring when I create a stateful `LSTM` layer with N `units`?

对着背影说爱祢 提交于 2019-11-28 03:57:04
The first arguments in a normal Dense layer is also units , and is the number of neurons/nodes in that layer. A standard LSTM unit however looks like the following: (This is a reworked version of " Understanding LSTM Networks ") In Keras, when I create an LSTM object like this LSTM(units=N, ...) , am I actually creating N of these LSTM units? Or is it the size of the "Neural Network" layers inside the LSTM unit, i.e., the W 's in the formulas? Or is it something else? For context, I'm working based on this example code . The following is the documentation: https://keras.io/layers/recurrent/ It

How to deal with batches with variable-length sequences in TensorFlow?

好久不见. 提交于 2019-11-28 03:24:20
I was trying to use an RNN (specifically, LSTM) for sequence prediction. However, I ran into an issue with variable sequence lengths. For example, sent_1 = "I am flying to Dubain" sent_2 = "I was traveling from US to Dubai" I am trying to predicting the next word after the current one with a simple RNN based on this Benchmark for building a PTB LSTM model . However, the num_steps parameter (used for unrolling to the previous hidden states), should remain the same in each Tensorflow's epoch. Basically, batching sentences is not possible as the sentences vary in length. # inputs = [tf.squeeze

How do I create a variable-length input LSTM in Keras?

一世执手 提交于 2019-11-28 03:20:21
I am trying to do some vanilla pattern recognition with an LSTM using Keras to predict the next element in a sequence. My data look like this: where the label of the training sequence is the last element in the list: X_train['Sequence'][n][-1] . Because my Sequence column can have a variable number of elements in the sequence, I believe an RNN to be the best model to use. Below is my attempt to build an LSTM in Keras: # Build the model # A few arbitrary constants... max_features = 20000 out_size = 128 # The max length should be the length of the longest sequence (minus one to account for the

What is num_units in tensorflow BasicLSTMCell?

◇◆丶佛笑我妖孽 提交于 2019-11-28 02:49:13
In MNIST LSTM examples, I don't understand what "hidden layer" means. Is it the imaginary-layer formed when you represent an unrolled RNN over time? Why is the num_units = 128 in most cases ? I know I should read colah's blog in detail to understand this, but, before that, I just want to get some code working with a sample time series data I have. nobar The number of hidden units is a direct representation of the learning capacity of a neural network -- it reflects the number of learned parameters . The value 128 was likely selected arbitrarily or empirically. You can change that value

keras中文文档笔记17——将Keras作为tensorflow的精简接口

▼魔方 西西 提交于 2019-11-28 02:36:36
将Keras作为tensorflow的精简接口 在tensorflow中调用Keras层 让我们以一个简单的例子开始:MNIST数字分类。我们将以Keras的全连接层堆叠构造一个TensorFlow的分类器, import tensorflow as tf sess = tf.Session() from keras import backend as K K.set_session(sess) 然后,我们开始用tensorflow构建模型: # this placeholder will contain our input digits, as flat vectors img = tf.placeholder(tf.float32, shape=( None , 784 )) 用Keras可以加速模型的定义过程: from keras.layers import Dense # Keras layers can be called on TensorFlow tensors: x = Dense( 128 , activation= 'relu' )(img) # fully-connected layer with 128 units and ReLU activation x = Dense( 128 , activation= 'relu' )(x) preds =

TypeError: can't pickle _thread.lock objects in Seq2Seq

时光总嘲笑我的痴心妄想 提交于 2019-11-28 00:56:25
I'm having trouble using buckets in my Tensorflow model. When I run it with buckets = [(100, 100)] , it works fine. When I run it with buckets = [(100, 100), (200, 200)] it doesn't work at all (stacktrace at bottom). Interestingly, running Tensorflow's Seq2Seq tutorial gives the same kind of issue with a nearly identical stacktrace. For testing purposes, the link to the repository is here . I'm not sure what the issue is, but having more than one bucket always seems to trigger it. This code won't work as a standalone, but this is the function where it is crashing - remember that changing

机器翻译中的深度学习技术:CNN,Seq2Seq,SGAN,Dual Learning

允我心安 提交于 2019-11-28 00:23:47
机器翻译是深度学习 技术 与 NLP 结合使用最活跃的 , 最 充满希望 的一个方向。从最初完全 基于 靠人编纂的规则 的机器 翻译方法,到后来 基于 统计 的 SMT 方法,再到现在 神经 机器翻译 NMT ,机器翻译 技术 在 过去 60 多年 的时间里一直不断的更新 , 特别是 在 2012 深度学习技术进入 人们 视野之后,机器翻译的准确率不断刷新,今天就主要盘点一下 各类 深度学习 机器翻译 里面的应用现状,给出一些比较有代表性的论文学习一下。 基于深度学习 技术的 神经 翻译技术( NMT ) 最大的 有点久在于 : 1 、采用 一种端到端 ( end -to-end )的 结构, 不在 需要人为 的 去抽取特征; 2 、网络 结构设计简单, 不需要进行 词语切分 、 词语 对齐 、句法树设计等复杂的设计工作。 同时 ,这一方法的缺点也很明显 : 1 、 可解释性 差, 以 seq2seq 为例,很难解释和理解隐层输出,即 encoder 输出 具体的物理意义 ; 2 、训练 复杂度高 , 耗时耗力 。 深度学习训练样本 量 往往多大亿 计, 训练一个模型需要专门得 GPU 集群 ,花费几天甚至 一周 时间才能得出一个结果 , 模型的迭代更新非常慢。 虽然 NMT 有它的不足,但总的来说还是 利远远 大于弊 , 下面就主要讲讲深度学习在机器翻译得一些具体应用: A

论文阅读:social lstm:Human Trajectory Prediction in Crowded Spaces

☆樱花仙子☆ 提交于 2019-11-28 00:23:09
社会LSTM:拥挤空间中的人类轨迹预测 学习笔记参考:study note: https://www.zybuluo.com/ArrowLLL/note/981714 摘要:行人遵循不同的轨迹以避开障碍物并容纳行人。任何导航这样一个场景的自动驾驶车辆都应该能够预见到行人的未来位置并相应地调整其路径以避免碰撞。轨迹预测的这个问题可以被视为序列生成任务,我们有兴趣根据他们过去的位置预测人的未来轨迹。 在最近成功的用于序列预测任务的递归神经网络(RNN)模型之后,我们提出了一种LSTM模型,该模型可以学习一般人类运动并预测他们未来的轨迹。 这与使用社会力量等手工制作功能的传统方法形成对比。我们在几个公共数据集上演示了我们方法的性能。我们的模型在某些数据集上优于最先进的方法。我们还分析了我们的模型预测的轨迹,以展示我们的模型所学习的运动行为。 传统方法的限制:i)他们使用手工制作的功能来为特定设置建模“交互”,而不是以数据驱动的方式推断它们。这导致有利于捕获简单交互(例如排斥/吸引力)的模型,并且可能无法推广更复杂的拥挤设置。ii)他们专注于建立彼此非常接近的人之间的互动(以避免直接碰撞)。但是,他们预计不会在更遥远的未来发生相互作用。 我们还分析了模型生成的轨迹模式,以了解从轨迹数据集中学到的社会约束。 3.1Social LSTM 每个人都有不同的运动模式:它们以不同的速度

【NLP】彻底搞懂BERT

不打扰是莪最后的温柔 提交于 2019-11-27 23:33:37
# 好久没更新博客了,有时候随手在本上写写,或者Evernote上记记,零零散散的笔记带来零零散散的记忆o(╥﹏╥)o。。还是整理到博客上比较有整体性,也方便查阅~ 自google在2018年10月底公布BERT在11项nlp任务中的卓越表现后,BERT(Bidirectional Encoder Representation from Transformers)就成为NLP领域大火、整个ML界略有耳闻的模型,网上相关介绍也很多,但很多技术内容太少,或是写的不全面半懂不懂,重复内容占绝大多数 (这里弱弱吐槽百度的搜索结果多样化。。) 一句话概括,BERT的出现,彻底改变了 预训练产生词向量 和 下游具体NLP任务 的关系,提出龙骨级的训练词向量概念。 目录:   词向量模型:word2vec, ELMo, BERT比较   BERT细则:Masked LM, Transformer, sentence-level   迁移策略:下游NLP任务调用接口   运行结果:破11项NLP任务最优纪录 一、词向量模型 这里主要横向比较一下word2vec,ELMo,BERT这三个模型,着眼在模型亮点与差别处。 传统意义上来讲,词向量模型是一个工具,可以把真实世界抽象存在的文字转换成可以进行数学公式操作的向量,而对这些向量的操作,才是NLP真正要做的任务。因而某种意义上,NLP任务分成两部分,

Predicting the next word using the LSTM ptb model tensorflow example

只愿长相守 提交于 2019-11-27 22:20:59
I am trying to use the tensorflow LSTM model to make next word predictions. As described in this related question (which has no accepted answer) the example contains pseudocode to extract next word probabilities: lstm = rnn_cell.BasicLSTMCell(lstm_size) # Initial state of the LSTM memory. state = tf.zeros([batch_size, lstm.state_size]) loss = 0.0 for current_batch_of_words in words_in_dataset: # The value of state is updated after processing each batch of words. output, state = lstm(current_batch_of_words, state) # The LSTM output can be used to make next word predictions logits = tf.matmul