bert-language-model

Fine-tune Bert for specific domain (unsupervised)

自古美人都是妖i 提交于 2021-01-20 08:39:28
问题 I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this possible? 回答1: What you in fact want to is continue pre-training BERT on text from your specific domain. What you do in this case is to continue training the model as masked language model, but on your domain-specific data. You can use the run_mlm.py script from the Huggingface's Transformers. 来源:

How to use Bert for long text classification?

…衆ロ難τιáo~ 提交于 2021-01-14 04:14:19
问题 We know that bert has a max length limit of tokens = 512, So if an acticle has a length of much bigger than 512, such as 10000 tokens in text How can bert be used? 回答1: You have basically three options: You cut the longer texts off and only use the first 512 Tokens. The original BERT implementation (and probably the others as well) truncates longer sequences automatically. For most cases, this option is sufficient. You can split your text in multiple subtexts, classifier each of them and

How to use Bert for long text classification?

五迷三道 提交于 2021-01-14 04:08:45
问题 We know that bert has a max length limit of tokens = 512, So if an acticle has a length of much bigger than 512, such as 10000 tokens in text How can bert be used? 回答1: You have basically three options: You cut the longer texts off and only use the first 512 Tokens. The original BERT implementation (and probably the others as well) truncates longer sequences automatically. For most cases, this option is sufficient. You can split your text in multiple subtexts, classifier each of them and

How to use Bert for long text classification?

人盡茶涼 提交于 2021-01-14 04:07:43
问题 We know that bert has a max length limit of tokens = 512, So if an acticle has a length of much bigger than 512, such as 10000 tokens in text How can bert be used? 回答1: You have basically three options: You cut the longer texts off and only use the first 512 Tokens. The original BERT implementation (and probably the others as well) truncates longer sequences automatically. For most cases, this option is sufficient. You can split your text in multiple subtexts, classifier each of them and

Sliding window for long text in BERT for Question Answering

岁酱吖の 提交于 2021-01-05 00:51:51
问题 I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented. From what I understand if the input are too long, sliding window can be used to process the text. Please correct me if I am wrong. Say I have a text "In June 2017 Kaggle announced that it passed 1 million registered users" . Given some stride and max_len , the input can be split into chunks with over lapping words (not considering padding). In June 2017 Kaggle

Sliding window for long text in BERT for Question Answering

浪尽此生 提交于 2021-01-05 00:27:26
问题 I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented. From what I understand if the input are too long, sliding window can be used to process the text. Please correct me if I am wrong. Say I have a text "In June 2017 Kaggle announced that it passed 1 million registered users" . Given some stride and max_len , the input can be split into chunks with over lapping words (not considering padding). In June 2017 Kaggle

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

爷,独闯天下 提交于 2020-12-30 06:12:46
问题 I got the following error when I ran my pytorch deep learning model in colab /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias) 1370 ret = torch.addmm(bias, input, weight.t()) 1371 else: -> 1372 output = input.matmul(weight.t()) 1373 if bias is not None: 1374 output += bias RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` I even reduced batch size from 128 to 64 i.e., reduced to half, but still, I got this error

BERT-based NER model giving inconsistent prediction when deserialized

倖福魔咒の 提交于 2020-12-13 04:02:17
问题 I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions. Code The model is the following: from transformers import BertForTokenClassification model = BertForTokenClassification.from_pretrained( "bert-base-cased", num_labels=NUM_LABELS, output_attentions = False, output_hidden_states = False ) I am using this snippet to save the model on Colab import torch torch.save(model.state_dict(),

BERT-based NER model giving inconsistent prediction when deserialized

老子叫甜甜 提交于 2020-12-13 04:00:40
问题 I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions. Code The model is the following: from transformers import BertForTokenClassification model = BertForTokenClassification.from_pretrained( "bert-base-cased", num_labels=NUM_LABELS, output_attentions = False, output_hidden_states = False ) I am using this snippet to save the model on Colab import torch torch.save(model.state_dict(),

AttributeError: 'str' object has no attribute 'dim' in pytorch

流过昼夜 提交于 2020-12-12 02:06:59
问题 I got the following error output in the PyTorch when sent model predictions into the model. Does anyone know what's going on? Following are the architecture model that I created, in the error output, it shows the issue exists in the x = self.fc1(cls_hs) line. class BERT_Arch(nn.Module): def __init__(self, bert): super(BERT_Arch, self).__init__() self.bert = bert # dropout layer self.dropout = nn.Dropout(0.1) # relu activation function self.relu = nn.ReLU() # dense layer 1 self.fc1 = nn.Linear