问题
I'm trying to create a lstm-rnn to generate sequences of music. The training data is a sequence of vectors of size 4, representing various features (including MIDI note) of each note in some songs to train on.
From my reading, it looks like what I'm trying to do is have for each input sample, the output sample is the next size 4 vector (i.e. it should be trying to predict the next note given the current one, and because of the LSTMs incorporating knowledge of samples that have come before).
I'm using tflearn as I'm still very new to RNNs. I have the following code
net = tflearn.input_data(shape=[None, seqLength, 4])
net = tflearn.lstm(net, 128, return_seq=True)
net = tflearn.dropout(net, 0.5)
net = tflearn.lstm(net, 128)
net = tflearn.dropout(net, 0.5)
net = tflearn.fully_connected(net, 4, activation='softmax')
net = tflearn.regression(net, optimizer='adam',
loss='mean_square')
# Training
model = tflearn.DNN(net, tensorboard_verbose=3)
model.fit(trainX, trainY, show_metric=True, batch_size=128)
Before this code I have split the trainX and trainY into sequences of length 20 (arbitrarily, but I read somewhere that training on sequences like this is a good way to do this).
This seems to be fine but I get the error ValueError: Cannot feed value of shape (128, 16, 4) for Tensor u'TargetsData/Y:0', which has shape '(?, 4)'
SO: my assumptions so far is that the input shape [None, seqLength, 4] is saying to TF [batchLength (which gets fed by tflearn sequentially), sequence length, feature length of sample]. What I don't understand is why it's saying the output is the wrong shape? Am I assuming wrongly with the data sequence split? When I just try to feed in all my data without splitting into sequences, so the input shape is [None, 4], TF tells me the LSTM layer expects an input shape with at least 3 dimensions.
I can't get my head round what the shapes of the inputs and outputs should be. It feels like this should be a simple thing -- I have a set of input sequences of vectors and I want the network to try and predict the next one in the sequence. There's very little online that doesn't assume a fairly advanced level of knowledge, so I've hit a brick wall. Really appreciate any insight anyone can give!
回答1:
I solved this, so am writing the answer here for anyone having the same problem. It was based on a mis-understanding of how these networks work, but this is assumed knowledge in most tutorials I've read so may not be clear to other beginners.
The LSTM networks are useful for these situations because they can take into account input history. The way the history is given to the LSTM is through the sequencing, but each sequence still leads to a single output data point. So the input must be of 3D shape, while the output is just 2D.
Given an entire sequence and a desired historyLength, I split the input into sequences of historyLength and a single output vector. This solved my shape problem.
来源:https://stackoverflow.com/questions/36519138/tensorflow-tflearn-input-shape