We know that bert has a max length limit of tokens = 512, So if an acticle has a length of much bigger than 512, such as 10000 tokens in text How can bert be used?
There is an approach used in the paper Defending Against Neural Fake News ( https://arxiv.org/abs/1905.12616)
Their generative model was producing outputs of 1024 tokens and they wanted to use BERT for human vs machine generations. They extended the sequence length which BERT uses simply by initializing 512 more embeddings and training them while they were fine-tuning BERT on their dataset.