问题
I am doing a long text classification task, which has more than 10000 words in doc, I am planing to use Bert as a paragraph encoder, then feed the embeddings of paragraph to BiLSTM step by step. The network is as below:
Input: (batch_size, max_paragraph_len, max_tokens_per_para,embedding_size)
bert layer: (max_paragraph_len,paragraph_embedding_size)
lstm layer: ???
output layer: (batch_size,classification_size)
How to implement it with keras? I am using keras's load_trained_model_from_checkpoint to load bert model
bert_model = load_trained_model_from_checkpoint(
config_path,
model_path,
training=False,
use_adapter=True,
trainable=['Encoder-{}-MultiHeadSelfAttention-Adapter'.format(i + 1) for i in range(layer_num)] +
['Encoder-{}-FeedForward-Adapter'.format(i + 1) for i in range(layer_num)] +
['Encoder-{}-MultiHeadSelfAttention-Norm'.format(i + 1) for i in range(layer_num)] +
['Encoder-{}-FeedForward-Norm'.format(i + 1) for i in range(layer_num)],
)
来源:https://stackoverflow.com/questions/58703885/how-to-implement-network-using-bert-as-a-paragraph-encoder-in-long-text-classifi