How to use Bert for long text classification?

前端未结

关注

 6  392

野的像风 2020-12-14 18:36

We know that bert has a max length limit of tokens = 512, So if an acticle has a length of much bigger than 512, such as 10000 tokens in text How can bert be used?

6条回答

隐瞒了意图╮ (楼主)

2020-12-14 19:01
You can leverage from the HuggingFace Transformers library that includes the following list of Transformers that work with long texts (more than 512 tokens):
- Reformer: that combines the modeling capacity of a Transformer with an architecture that can be executed efficiently on long sequences.
- Longformer: with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.
Eight other recently proposed efficient Transformer models include Sparse Transformers (Child et al.,2019), Linformer (Wang et al., 2020), Sinkhorn Transformers (Tay et al., 2020b), Performers (Choromanski et al., 2020b), Synthesizers (Tay et al., 2020a), Linear Transformers (Katharopoulos et al., 2020), and BigBird (Zaheeret al., 2020).

The paper from the authors from Google Research and DeepMind tries to make a comparison between these Transformers based on Long-Range Arena "aggregated metrics":

They also suggest that Longformers have better performance than Reformer when it comes to the classification task.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...