The tensorflow tutorial on language model allows to compute the probability of sentences :
probabilities = tf.nn.softmax(logits)
in the co
Your output is a TensorFlow list and it is possible to get its max argument (the predicted most probable class) with a TensorFlow function. This is normally the list that contains the next word's probabilities.
At "Evaluate the Model" from this page, your output list is y
in the following example:
First we'll figure out where we predicted the correct label.
tf.argmax
is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example,tf.argmax(y,1)
is the label our model thinks is most likely for each input, whiletf.argmax(y_,1)
is the true label. We can usetf.equal
to check if our prediction matches the truth.correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
Another approach that is different is to have pre-vectorized (embedded/encoded) words. You could vectorize your words (therefore embed them) with Word2vec to accelerate learning, you might want to take a look at this. Each word could be represented as a point in a 300 dimensions space of meaning, and you could find automatically the "N words" closest to the predicted point in space at the output of the network. In that case, the argmax
way to proceed does not work anymore and you could probably compare on cosine similarity with the words you truly wanted to compare to, but for that I am not sure actually how does this could cause numerical instabilities. In that case y
will not represent words as features, but word embeddings over a dimensionality of, let's say, 100 to 2000 in size according to different models. You could Google something like this for more info: "man woman queen word addition word2vec" to understand the subject of embeddings more.
Note: when I talk about word2vec here, it is about using an external pre-trained word2vec model to help your training to only have pre-embedded inputs and create embedding outputs. Those outputs' corresponding words can be re-figured out by word2vec to find the corresponding similar top predicted words.
Notice that the approach I suggest is not exact since it would be only useful to know if we predict EXACTLY the word that we wanted to predict. For a more soft approach, it would be possible to use ROUGE or BLEU metrics for evaluating your model in case you use sentences or something longer than a word.
You need to find the argmax of the probabilities, and translate the index back to a word by reversing the word_to_id map. To get this to work, you must save the probabilities in the model and then fetch them from the run_epoch function (you could also save just the argmax itself). Here's a snippet:
inverseDictionary = dict(zip(word_to_id.values(), word_to_id.keys()))
def run_epoch(...):
decodedWordId = int(np.argmax(logits))
print (" ".join([inverseDictionary[int(x1)] for x1 in np.nditer(x)])
+ " got" + inverseDictionary[decodedWordId] +
+ " expected:" + inverseDictionary[int(y)])
See full implementation here: https://github.com/nelken/tf
It is actually an advantage that the function returns a probability instead of the word itself. Since it is using a list of words, with the associated probabilities, you can do further processing, and increase the accuracy of your result.
To answer your question: You can take the list of words, iterate though it , and make the program display the word with the highest probability.