lstm | 易学教程

Keras LSTM: a time-series multi-step multi-features forecasting - poor results

阅读更多关于 Keras LSTM: a time-series multi-step multi-features forecasting - poor results

I have a time series dataset containing data from a whole year (date is the index). The data was measured every 15 min (during whole year) which results in 96 timesteps a day. The data is already normalized. The variables are correlated. All the variables except the VAR are weather measures. VAR is seasonal in a day period and in a week period (as it looks a bit different on weekend, but more less the same every weekend). VAR values are stationary. I would like to predict values of VAR for next two days (192 steps ahead) and for next seven days (672 steps ahead). Here is the sample of the

Keras lstm with masking layer for variable-length inputs

阅读更多关于 Keras lstm with masking layer for variable-length inputs

问题 I know this is a subject with a lot of questions but I couldn't find any solution to my problem. I am training a LSTM network on variable-length inputs using a masking layer but it seems that it doesn't have any effect. Input shape (100, 362, 24) with 362 being the maximum sequence lenght, 24 the number of features and 100 the number of samples (divided 75 train / 25 valid). Output shape (100, 362, 1) transformed later to (100, 362 - N, 1). Here is the code for my network: from keras import

Shuffling training data with LSTM RNN

阅读更多关于 Shuffling training data with LSTM RNN

问题 Since an LSTM RNN uses previous events to predict current sequences, why do we shuffle the training data? Don't we lose the temporal ordering of the training data? How is it still effective at making predictions after being trained on shuffled training data? 回答1: In general, when you shuffle the training data (a set of sequences), you shuffle the order in which sequences are fed to the RNN, you don't shuffle the ordering within individual sequences. This is fine to do when your network is

Use LSTM tutorial code to predict next word in a sentence?

阅读更多关于 Use LSTM tutorial code to predict next word in a sentence?

问题 I've been trying to understand the sample code with https://www.tensorflow.org/tutorials/recurrent which you can find at https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py (Using tensorflow 1.3.0.) I've summarized (what I think are) the key parts, for my question, below: size = 200 vocab_size = 10000 layers = 2 # input_.input_data is a 2D tensor [batch_size, num_steps] of # word ids, from 1 to 10000 cell = tf.contrib.rnn.MultiRNNCell( [tf.contrib.rnn

Tensorflow dynamic RNN (LSTM): how to format input?

阅读更多关于 Tensorflow dynamic RNN (LSTM): how to format input?

问题 I have been given some data of this format and the following details: person1, day1, feature1, feature2, ..., featureN, label person1, day2, feature1, feature2, ..., featureN, label ... person1, dayN, feature1, feature2, ..., featureN, label person2, day1, feature1, feature2, ..., featureN, label person2, day2, feature1, feature2, ..., featureN, label ... person2, dayN, feature1, feature2, ..., featureN, label ... there is always the same number of features but each feature might be a 0

How to train a RNN with LSTM cells for time series prediction

阅读更多关于 How to train a RNN with LSTM cells for time series prediction

问题 I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values. I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training. For example let's assume we want to learn the given sequence: [1,2,3,4,5,6,7,8,9,10,11,...] And we unroll the network for num_steps=4 . Option 1 input

Stateful LSTM and stream predictions

阅读更多关于 Stateful LSTM and stream predictions

I've trained an LSTM model (built with Keras and TF) on multiple batches of 7 samples with 3 features each, with a shape the like below sample (numbers below are just placeholders for the purpose of explanation), each batch is labeled 0 or 1: Data: [ [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]] [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]] [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]] ... ] i.e: batches of m sequences, each of length 7, whose elements are 3-dimensional vectors (so batch has shape (m*7*3)) Target: [ [1] [0] [1] ... ] On my production

LSTM module for Caffe

阅读更多关于 LSTM module for Caffe

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: Does anyone know if there exists a nice LSTM module for Caffe? I found one from a github account by russel91 but apparantly the webpage containing examples and explanations disappeared (Formerly http://apollo.deepmatter.io/ --> it now redirects only to the github page which has no examples or explanations anymore). 回答1: I know Jeff Donahue worked on LSTM models using Caffe. He also gave a nice tutorial during CVPR 2015. He has a pull-request with RNN and LSTM. Update : there is a new PR by Jeff Donahue including RNN and LSTM. This PR was

LSTM with Attention

阅读更多关于 LSTM with Attention

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am trying to add attention mechanism to stacked LSTMs implementation https://github.com/salesforce/awd-lstm-lm All examples online use encoder-decoder architecture, which I do not want to use (do I have to for the attention mechanism?). Basically, I have used https://webcache.googleusercontent.com/search?q=cache:81Q7u36DRPIJ:https://github.com/zhedongzheng/finch/blob/master/nlp-models/pytorch/rnn_attn_text_clf.py+&cd=2&hl=en&ct=clnk&gl=uk def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5, dropouth=0.5, dropouti=0.5,

Batch-major vs time-major LSTM

阅读更多关于 Batch-major vs time-major LSTM

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: Do RNNs learn different dependency patterns when the input is batch-major as opposed to time-major? 回答1: (Edit: sorry my initial argument was why it makes sense but I realized that it doesn't so this is a little OT.) I haven't found the TF-groups reasoning behind this but it does does not make computational sense as ops are written in C++. Intuitively, we want to mash up (multiply/add etc) different features from the same sequence on the same timestep. Different timesteps can’t be done in parallell while batch/sequences can so feature>batch