I am working on an Amazon Review prediction, and the task is to predict user ratings in the scale of 1-5 given the review text. I found BERT tokenization and the BERT-base m