I\'m currently working on improving the inference speed of GPT. One of the factor is the scale of the vocabulary based on which the GPT model is built on (usually the subwor