Given a file, I have to tokenize it (so to speak) - I have to retrieve only the following 5 types of tokens: