How to use Bert for long text classification?

前端 未结 6 392
野的像风
野的像风 2020-12-14 18:36

We know that bert has a max length limit of tokens = 512, So if an acticle has a length of much bigger than 512, such as 10000 tokens in text How can bert be used?

6条回答
  •  隐瞒了意图╮
    2020-12-14 19:01

    You can leverage from the HuggingFace Transformers library that includes the following list of Transformers that work with long texts (more than 512 tokens):

    • Reformer: that combines the modeling capacity of a Transformer with an architecture that can be executed efficiently on long sequences.
    • Longformer: with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.

    Eight other recently proposed efficient Transformer models include Sparse Transformers (Child et al.,2019), Linformer (Wang et al., 2020), Sinkhorn Transformers (Tay et al., 2020b), Performers (Choromanski et al., 2020b), Synthesizers (Tay et al., 2020a), Linear Transformers (Katharopoulos et al., 2020), and BigBird (Zaheeret al., 2020).

    The paper from the authors from Google Research and DeepMind tries to make a comparison between these Transformers based on Long-Range Arena "aggregated metrics":

    They also suggest that Longformers have better performance than Reformer when it comes to the classification task.

提交回复
热议问题