I\'m creating a model using BertModel to identify answer span (without using BertForQA).
I have an indepent linear layer for determining start and end token respectiv