recurrent-neural-network

How to handle padding when using sequence_length parameter in TensorFlow dynamic_rnn

[亡魂溺海] 提交于 2019-12-04 14:29:28
I'm trying to use the dynamic_rnn function in Tensorflow to speed up training. After doing some reading, my understanding is that one way to speed up training is to explicitly pass a value to the sequence_length parameter in this function. After a bit more reading, and finding this SO explanation, it seems like what I need to pass is a vector (maybe defined by a tf.placeholder ) that contains the length of each sequence within a batch. Here's where I'm confused: in order to take advantage of this, should I pad each of my batches to the longest-length sequence within the batch instead of the

predict exponential weighted average using a simple rnn

房东的猫 提交于 2019-12-04 14:03:28
In an attempt to further explore the keras-tf RNN capabilities and different parameters, i decided to solve a toy problem as described - build a source data set composed of a sequence of random numbers build a "labels" data set comprised of the EWMA formula performed on the source dataset. The idea behind it is that EWMA has a very clear and simple definition of how it uses the "history" of the sequence - EWMA t = (1-alpha)*average t-1 + alpha*x t My assumption is, that when looking at a simple RNN cell with a single neuron for current input and a single one for the previous state, the (1

When to stop training neural networks?

有些话、适合烂在心里 提交于 2019-12-04 12:46:23
I'm trying to carry out a domain-specific classification research using RNN and have accumulated tens of millions of texts. Since it takes days and even months to run the whole dataset over, I only picked a small portion of it for testing, say 1M texts (80% for training, 20% for validation). I pre-trained the whole corpus with word vectorization and I also applied Dropout to the model to avoid over-fitting. When it trained 60000 text within 12 hrs, the loss had already dropped to a fairly low level with the accuracy 97%. Should I continue or not? Does it help continue with the training? It is

Recurrent convolutional BLSTM neural network - arbitrary sequence lengths

走远了吗. 提交于 2019-12-04 11:50:57
Using Keras + Theano I successfully made a recurrent bidirectional-LSTM neural network that is capable of training on and classifying DNA sequences of arbitrary lengths, using the following model (for fully working code see: http://pastebin.com/jBLv8B72 ): sequence = Input(shape=(None, ONE_HOT_DIMENSION), dtype='float32') dropout = Dropout(0.2)(sequence) # bidirectional LSTM forward_lstm = LSTM( output_dim=50, init='uniform', inner_init='uniform', forget_bias_init='one', return_sequences=True, activation='tanh', inner_activation='sigmoid', )(dropout) backward_lstm = LSTM( output_dim=50, init=

Predict using data with less time steps (different dimension) using Keras RNN model

不想你离开。 提交于 2019-12-04 11:36:46
According to the nature of RNN, we can get an output of predicted probabilities at every time stamp (unfold in time). Suppose I train an RNN with 5 time steps, each having 6 features. Thus I have to specify the first layer like this(suppose we use a LSTM layer with 20 nodes as the first layer): model.add(LSTM(20, return_sequences=True, input_shape=(5, 6))) And the model works well if I input the same dimension data. However, now I want to use first 3 time steps of the data to get the prediction (input shape will be 3, 6), the same syntax will not be accepted. My question is, is it possible to

Why does RNN always output 1

你离开我真会死。 提交于 2019-12-04 11:26:54
I am using Recurrent Neural Networks (RNN) for forecasting, but for some weird reason, it always outputs 1. Here I explain this with a toy example as: Example Consider a matrix M of dimensions (360, 5), and a vector Y which contains rowsum of M . Now, using RNN, I want to predict Y from M . Using rnn R package, I trained model as library(rnn) M <- matrix(c(1:1800),ncol=5,byrow = TRUE) # Matrix (say features) Y <- apply(M,1,sum) # Output equls to row sum of M mt <- array(c(M),dim=c(NROW(M),1,NCOL(M))) # matrix formatting as [samples, timesteps, features] yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y)

How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?

你离开我真会死。 提交于 2019-12-04 09:35:51
I am trying to write a language model using word embeddings and recursive neural networks in TensorFlow 0.9.0 using the tf.nn.dynamic_rnn graph operation, but I don't understand how the input tensor is structured. Let's say I have a corpus of n words. I embed each word in a vector of length e , and I want my RNN to unroll to t time steps. Assuming I use the default time_major = False parameter, what shape would my input tensor [batch_size, max_time, input_size] have? Maybe a specific tiny example will make this question clearer. Say I have a corpus consisting of n=8 words that looks like this.

Seq2Seq model learns to only output EOS token (<\\s>) after a few iterations

烈酒焚心 提交于 2019-12-04 08:17:35
I am creating a chatbot trained on Cornell Movie Dialogs Corpus using NMT . I am basing my code in part from https://github.com/bshao001/ChatLearner and https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot During training, I print a random output answer fed to the decoder from the batch and the corresponding answer that my model predicts to observe the learning progress. My issue: After only about 4 iterations of training, the model learns to output the EOS token ( <\s> ) for every timestep. It always outputs that as its response (determined using argmax

How to classify continuous audio

落花浮王杯 提交于 2019-12-04 07:15:17
I have a audio data set and each of them has different length. There are some events in these audios, that I want to train and test but these events are placed randomly, plus the lengths are different, it is really hard to build a machine learning system with using that dataset. I thought fixing a default size of length and build a multilayer NN however, the length's of events are also different. Then I thought about using CNN, like it is used to recognise patterns or multiple humans on an image. The problem for that one is I am really struggling when I try to understand the audio file. So, my

Varying sequence length in Keras without padding

早过忘川 提交于 2019-12-04 04:38:46
I have a question regarding varying sequence lengths for LSTMs in Keras. I'm passing batches of size 200 and sequences of variable lengths (= x) with 100 features for each object in the sequence (=> [200, x, 100]) into a LSTM: LSTM(100, return_sequences=True, stateful=True, input_shape=(None, 100), batch_input_shape=(200, None, 100)) I'm fitting the model on the following randomly created matrices: x_train = np.random.random((1000, 50, 100)) x_train_2 = np.random.random((1000, 10,100)) As far as I understood LSTMs (and the Keras implementation) correctly, the x should refer to the number of