In training language models, I have seen perplexity being used as an evaluation metric quite often, however, what I am confused with is if it is simply used to just determin