How to handle <UKN> tokens in text generation
问题 In my text generation dataset, I have converted all infrequent words into the token (unknown word), as suggested by most text-generation literature. However, when training an RNN to take in part of a sentence as input and predict the rest of the sentence, I am not sure how I should stop the network from generating tokens. When the network encounters an unknown (infrequent) word in the training set, what should its output be? Example: Sentence: I went to the mall and bought a <ukn> and some