I\'m working on a classifier for a sample of long wikipedia articles (5000+ tokens in each article) using a simple LSTM model with 2 layers as part of a school project. The