I have to do a text classification task with 28 possible classes. I decided to load BERT\'s model as a pre-trained model and to fine tune it for solving my problem. The thin