lstm | 易学教程

pytorch nn.LSTM()参数详解

阅读更多关于 pytorch nn.LSTM()参数详解

输入数据格式： input(seq_len, batch, input_size) h0(num_layers * num_directions, batch, hidden_size) c0(num_layers * num_directions, batch, hidden_size) 输出数据格式： output(seq_len, batch, hidden_size * num_directions) hn(num_layers * num_directions, batch, hidden_size) cn(num_layers * num_directions, batch, hidden_size) import torch import torch.nn as nn from torch.autograd import Variable #构建网络模型---输入矩阵特征数input_size、输出矩阵特征数hidden_size、层数num_layers inputs = torch.randn(5,3,10) ->(seq_len,batch_size,input_size) rnn = nn.LSTM(10,20,2) -> (input_size,hidden_size,num_layers) h0 = torch.randn(2,3,20) ->(num

LSTM比较RNN

阅读更多关于 LSTM比较RNN

LSTM只能避免RNN的梯度消失（gradient vanishing），但是不能对抗梯度爆炸问题（Exploding Gradient）。梯度膨胀(gradient explosion)不是个严重的问题，一般靠裁剪后的优化算法即可解决，比如gradient clipping（如果梯度的范数大于某个给定值，将梯度同比收缩）。梯度剪裁的方法一般有两种： 1.一种是当梯度的某个维度绝对值大于某个上限的时候，就剪裁为上限。 2.另一种是梯度的L2范数大于上限后，让梯度除以范数，避免过大。

语义角色标注

阅读更多关于语义角色标注

语义角色标注本教程源代码目录在 book/label_semantic_roles ,初次使用请您参考 Book文档使用说明。 # 说明本教程可支持在 CPU/GPU 环境下运行 Docker镜像支持的CUDA/cuDNN版本如果使用了Docker运行Book，请注意：这里所提供的默认镜像的GPU环境为 CUDA 8/cuDNN 5，对于NVIDIA Tesla V100等要求CUDA 9的 GPU，使用该镜像可能会运行失败; 文档和脚本中代码的一致性问题请注意：为使本文更加易读易用，我们拆分、调整了 train.py 的代码并放入本文。本文中代码与train.py的运行结果一致，可直接运行train.py进行验证。 # 背景介绍自然语言分析技术大致分为三个层面：词法分析、句法分析和语义分析。语义角色标注是实现浅层语义分析的一种方式。在一个句子中，谓词是对主语的陈述或说明，指出“做什么”、“是什么”或“怎么样，代表了一个事件的核心，跟谓词搭配的名词称为论元。语义角色是指论元在动词所指事件中担任的角色。主要有：施事者（Agent）、受事者（Patient）、客体（Theme）、经验者（Experiencer）、受益者（Beneficiary）、工具（Instrument）、处所（Location）、目标（Goal）和来源（Source）等。请看下面的例子，“遇到”

情感分析

阅读更多关于情感分析

情感分析本教程源代码目录在 book/understand_sentiment ,初次使用请您参考 Book文档使用说明。 # 背景介绍在自然语言处理中，情感分析一般是指判断一段文本所表达的情绪状态。其中，一段文本可以是一个句子，一个段落或一个文档。情绪状态可以是两类，如（正面，负面），（高兴，悲伤）；也可以是三类，如（积极，消极，中性）等等。情感分析的应用场景十分广泛，如把用户在购物网站（亚马逊、天猫、淘宝等）、旅游网站、电影评论网站上发表的评论分成正面评论和负面评论；或为了分析用户对于某一产品的整体使用感受，抓取产品的用户评论并进行情感分析等等。表格1展示了对电影评论进行情感分析的例子：电影评论类别在冯小刚这几年的电影里，算最好的一部的了正面很不好看，好像一个地方台的电视剧负面圆方镜头全程炫技，色调背景美则美矣，但剧情拖沓，口音不伦不类，一直努力却始终无法入戏负面剧情四星。但是圆镜视角加上婺源的风景整个非常有中国写意山水画的感觉，看得实在太舒服了。。正面表格 1 电影评论情感分析在自然语言处理中，情感分析属于典型的文本分类问题，即把需要进行情感分析的文本划分为其所属类别。文本分类涉及文本表示和分类方法两个问题。在深度学习的方法出现之前，主流的文本表示方法为词袋模型BOW(bag of words)，话题模型等等；分类方法有SVM(support

Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification 论文研读

阅读更多关于 Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification 论文研读

0. 个人浅谈 2. 引言 3. 相关工作 4. 模型输入层：将句子输入模型。嵌入层：把每个词映射到低维的向量。 LSTM层：使用双向LSTM逐步获取高级特征。注意力层，产生一个权重向量，然后通过把权重向量和每个时间步长的词特征相乘，合并为句子级别的特征向量。输出层，句子级别的特征向量最终用来特征分类。 4.1 Word Embeddings W w r d W^{wrd} W w r d 。该矩阵W是学习的参数，每个词词嵌入的维度是一个超参数，需要用户提前指定。我们把每个单词 x i x_i x i 转换为词嵌入 e i e_i e i ，是通过矩阵向量运算得到的。 e i = W w r d v i e_i = W^{wrd}v_i e i = W w r d v i v i v_i v i 是一个独热向量，只在 e i e_i e i 的位置为1，其它位置均为0。所以句子可以表示为向量 e m b s = { e 1 , e 2 , … , e T } emb_s=\{e_1,e_2,\dots,e_T\} e m b s = { e 1 , e 2 , … , e T } 。 4.2 双向LSTM网络 4.3 Attention H = [ h 1 , h 2 , … , h T ] H = [h_1,h_2,\dots,h_T] H = [ h 1 , h 2 ,

吴裕雄--天生自然 PYTHON数据分析：所有美国股票和etf的历史日价格和成交量分析

阅读更多关于吴裕雄--天生自然 PYTHON数据分析：所有美国股票和etf的历史日价格和成交量分析

# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load in import matplotlib.pyplot as plt import statsmodels.tsa.seasonal as smt import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) import random import datetime as dt from sklearn import linear_model from sklearn.metrics import mean_absolute_error import plotly # import the relevant Keras modules from keras.models import

How do I train tesseract 4 with image data instead of a font file?

阅读更多关于 How do I train tesseract 4 with image data instead of a font file?

I'm trying to train Tesseract 4 with images instead of fonts. In the docs they are explaining only the approach with fonts, not with images. I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff files to train with LSTM in Tesseract 4. I looked into tesstrain.sh , which is used to generate LSTM training data but couldn't find anything helpful. Any ideas? 来源： https://stackoverflow.com/questions/43352918/how-do-i-train-tesseract-4-with-image-data-instead-of-a-font-file

Keras lstm with masking layer for variable-length inputs

阅读更多关于 Keras lstm with masking layer for variable-length inputs

I know this is a subject with a lot of questions but I couldn't find any solution to my problem. I am training a LSTM network on variable-length inputs using a masking layer but it seems that it doesn't have any effect. Input shape (100, 362, 24) with 362 being the maximum sequence lenght, 24 the number of features and 100 the number of samples (divided 75 train / 25 valid). Output shape (100, 362, 1) transformed later to (100, 362 - N, 1). Here is the code for my network: from keras import Sequential from keras.layers import Embedding, Masking, LSTM, Lambda import keras.backend as K # O O O #

Non-linear multivariate time-series response prediction using RNN

阅读更多关于 Non-linear multivariate time-series response prediction using RNN

I am trying to predict the hygrothermal response of a wall, given the interior and exterior climate. Based on literature research, I believe this should be possible with RNN but I have not been able to get good accuracy. The dataset has 12 input features (time-series of exterior and interior climate data) and 10 output features (time-series of hygrothermal response), both containing hourly values for 10 years. This data was created with hygrothermal simulation software, there is no missing data. Dataset features: Dataset targets: Unlike most time-series prediction problems, I want to predict

Keras - stateful vs stateless LSTMs

阅读更多关于 Keras - stateful vs stateless LSTMs

I'm having a hard time conceptualizing the difference between stateful and stateless LSTMs in Keras. My understanding is that at the end of each batch, the "state of the network is reset" in the stateless case, whereas for the stateful case, the state of the network is preserved for each batch, and must then be manually reset at the end of each epoch. My questions are as follows: 1. In the stateless case, how is the network learning if the state isn't preserved in-between batches? 2. When would one use the stateless vs stateful modes of an LSTM? I recommend you to firstly learn the concepts of

订阅 lstm