I have a question regarding "on-the-fly" tokenization. This question was elicited by reading the "How to train a new language model from scratch using Transfo