attention | 易学教程

周报5

阅读更多关于周报5

　　本周主要学习了self-attention的原理、精读论文《A Self-Attentive model for Knowledge Tracing》和他人复现该论文的代码。　　 1.self-attention介绍　　2015-2017年，自从 attention 提出后，基本就成为 NLP 模型的标配。　《attention is all you need》中指出：1. 靠attention机制，不使用rnn和cnn，并行度高 2.通过attention，抓长距离依赖关系比rnn强　　（1）self-attention基本结构：　　　　　　　　　　图1 　　　　Q(Query), K(Key), V(Value)三个矩阵均来自同一输入。（2）self-attention计算　　　　第一步：从每个编码器的输入向量（每个单词的词向量）中生成三个向量。即对于每个单词，创造一个查询向量Q、一个键向量K和一个值向量V。这三个向量都是通过词嵌入与三个权重矩阵后相乘创建的，如下图所示。　　　　　　　　　　　　　　　　　　　　　　图2 　　X 1 与W Q 权重矩阵相乘得到q1, 就是与这个单词相关的查询向量,同理得到键向量和值向量。W Q 、W K 、W V 都是随机初始化得到的。　　第二步：计算得分，计算本例子中的第一个词“Thinking”自注意力向量

Tensorflow seq2seq get sequence hidden state

阅读更多关于 Tensorflow seq2seq get sequence hidden state

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I just started to work on tensorflow not long ago. I'm working on the seq2seq model and somehow got the tutorial to work, but I'm stuck at getting the states of each sentence. As far as I understand, the seq2seq model takes an input sequence and generates a hidden state for the sequence through RNN. Later, the model uses the sequence's hidden state to generate a new sequence of data. My problem is what should I do if I want to use the hidden state of input sequence directly? Say for example if I have a trained model, how should I get the

LSTM with Attention

阅读更多关于 LSTM with Attention

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am trying to add attention mechanism to stacked LSTMs implementation https://github.com/salesforce/awd-lstm-lm All examples online use encoder-decoder architecture, which I do not want to use (do I have to for the attention mechanism?). Basically, I have used https://webcache.googleusercontent.com/search?q=cache:81Q7u36DRPIJ:https://github.com/zhedongzheng/finch/blob/master/nlp-models/pytorch/rnn_attn_text_clf.py+&cd=2&hl=en&ct=clnk&gl=uk def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5, dropouth=0.5, dropouti=0.5,

Find column name in pandas that matches an array

阅读更多关于 Find column name in pandas that matches an array

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 由翻译强力驱动问题: I have a large dataframe (5000 x 12039) and I want to get the column name that matches a numpy array. For example, if I have the table m1lenhr m1lenmin m1citywt m1a12a cm1age cm1numb m1b1a m1b1b m1b12a m1b12b ... kind_attention_scale_10 kind_attention_scale_22 kind_attention_scale_21 kind_attention_scale_15 kind_attention_scale_18 kind_attention_scale_19 kind_attention_scale_25 kind_attention_scale_24 kind_attention_scale_27 kind_attention_scale_23 challengeID 1 0.130765 40.0 202.485367 1.893256 27.0 1.0 2.0 0.0 2.254198 2.289966 .

Abstractive Summarization

阅读更多关于 Abstractive Summarization

Abstractive Summarization A Neural Attention Model for Abstractive Sentence Summarization Alexander M. Rush et al., Facebook AI Research/Harvard EMNLP2015 sentence level seq2seq模型在2014年提出，这篇论文是将seq2seq模型应用在abstractive summarization任务上比较早期的论文。同组的人还发表了一篇NAACL2016（Sumit Chopra, Facebook AI Research_Abstractive sentence summarization with attentive recurrent neural networks）（作者都差不多），在这篇的基础上做了更多的改进，效果也更好。这两篇都是在abstractive summarization任务上使用seq2seq模型的经典baseline。目标函数是negative log likelihood，使用mini-batch SGD优化本文提出了3种encoder，重点在于Attention-based encoder bag-of-words encoder Conv encoder: 参考TextCNN

图像标注：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

阅读更多关于图像标注：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

这篇文章是2015年ICML上的一篇文章，把attention机制引入到图像领域的文章，作者Kelvin Xu 、Yoshua Bengio等人，来自多伦多大学和蒙特利尔大学。 Image caption是计算机视觉的最初始任务，不仅要获得图片里的物体，还要表达他们之间的关系。目前现存的方法大都是encoder ― decoder架构，利用CNN、RNN、LSTM等神经网络完成caption工作，比如说只使用CNN对图像进行特征提取，然后利用提取的特征生成caption，还有结合CNN和RNN的，使用CNN提取图像特征，将Softmax层之前的那一层vector作为encoder端的输出并送入decoder中，使用LSTM对其解码并生成句子，这种方法也是本文所采取的方法，只是在此基础上嵌入了soft和hard attention机制。除了神经网络之外，caption还有两种典型的方法： 1、使用模板的方法，填入一些图像中的物体； 2、使用检索的方法，寻找相似描述。这两种方法都使用了一种泛化的手段，使得描述跟图片很接近，但又不是很准确。所以作者在此基础上提出了自己的模型架构，将soft 和hard attention引入到caption，并利用可视化手段理解attention机制的效果。模型：模型的总体架构如上图所示，也是由encoder和decoder组成。 Encoder

图像标注：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

阅读更多关于图像标注：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Attention Seq2Seq模型

阅读更多关于 Attention Seq2Seq模型

参数 Input encoder_inputs：encoder的输入，int32型 id tensor list decoder_inputs：decoder的输入，int32型id tensor list cell： RNN_Cell的实例 num_encoder_symbols, num_decoder_symbols：分别是编码和解码的符号数，即词表大小 embedding_size：词向量的维度 num_heads：attention的抽头数量，一个抽头算一种加权求和方式，后面会进一步介绍 output_projection：decoder的output向量投影到词表空间时，用到的投影矩阵和偏置项(W, B)；W的shape是[output_size, num_decoder_symbols]，B的shape是[num_decoder_symbols]；若此参数存在且feed_previous=True，上一个decoder的输出先乘W再加上B作为下一个decoder的输入 feed_previous：若为True, 只有第一个decoder的输入（“GO"符号）有用，所有的decoder输入都依赖于上一步的输出；一般在测试时用（当然源码也提到，可以在训练时用于模拟测试的环境，比如 Scheduled Sampling ） initial_state_attention

【深度学习】Attention机制理解与总结

阅读更多关于【深度学习】Attention机制理解与总结

深度学习中Attention Mechanism详细介绍：原理、分类及应用目前主流的attention方法都有哪些？ Attention Mechanism可以帮助模型对输入的X每个部分赋予不同的权重，抽取出更加关键及重要的信息，使模型做出更加准确的判断，同时不会对模型的计算和存储带来更大的开销，这也是Attention Mechanism应用如此广泛的原因。之前在做知识库问答和阅读理解问答的研究中都用到了attention机制，效果确实比较显著（虽然减慢训练速度的效果也比较显著…）。在是谷歌发布论文 Attention Is All You Need 后，attention更是成为了一种普遍做法。后来发现在图像领域attention也有应用，在CNN上加attention感觉比较神奇，因此做一个小的总结。等读完这篇论文后，再来补充论文里的思想。 RNN with Attention 在nlp领域，attention主要应用在Encoder + Decoder框架的基础上。 attention最早应该出现在2014年bengio的neural machine translation论文上面，在seq2seq问题上引入attention CNN with Attention 主要分为两种，一种是spatial attention, 另外一种是channel attention。

Attention机制论文阅读――SCA-CNN

阅读更多关于 Attention机制论文阅读――SCA-CNN

论文：SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning 文章提出一个新颖的卷积神经网络叫做SCA-CNN，在CNN中加入了Spatial Attention和Channel-wise Attention机制。在图像标注任务中，SCA-CNN动态调制了多层特征图中的句子迭代的context，包含了where信息（和多层卷积层中的空间位置相关）和what信息（和channels相关）。然而，大多数现有的基于注意力的图像字幕模型只考虑了空间特征，即那些注意模型仅通过空间细节权重将句子上下文调制到最后一个信息层特征图中。本文中，充分将CNN特征的三个特点应用在可视化的基于attention的image caption中。 Attention细节：，d是隐藏层状态的维度。在卷积层的第l层，spatial和channel-wise attention的权重由和通过函数计算得到。最后，SCA-CNN使用attention权重将调制进行调制，得到调制后的特征。最后，通过如下的过程产生第t个单词：其中，L是卷积层的层数，pt是一个概率向量。可以分别计算和两个权重来近似。和分别表示spatial模型和channel attention模型。

订阅 attention