I\'m using pytorch and i\'m using the base pretrained bert to classify sentences for hate speech. i want to implement a Bi-LSTM layer that takes as an input all outputs of t