I have been trying to make Transformer based language model, for the loss function Negative Log-likelihood is implemented. For some reason, after a few iterations, there is