huggingface-transformers

Saving and reload huggingface fine-tuned transformer

阅读更多关于 Saving and reload huggingface fine-tuned transformer

问题 I am trying to reload a fine-tuned DistilBertForTokenClassification model. I am using transformers 3.4.0 and pytorch version 1.6.0+cu101. After using the Trainer to train the downloaded model, I save the model with trainer.save_model() and in my trouble shooting I save in a different directory via model.save_pretrained(). I am using Google Colab and saving the model to my Google drive. After testing the model I also evaluated the model on my test getting great results, however, when I return

AutoTokenizer.from_pretrained fails to load locally saved pretrained tokenizer (PyTorch)

阅读更多关于 AutoTokenizer.from_pretrained fails to load locally saved pretrained tokenizer (PyTorch)

问题 I am new to PyTorch and recently, I have been trying to work with Transformers. I am using pretrained tokenizers provided by HuggingFace. I am successful in downloading and running them. But if I try to save them and load again, then some error occurs. If I use AutoTokenizer.from_pretrained to download a tokenizer, then it works. [1]: tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') text = "Hello there" enc = tokenizer.encode_plus(text) enc.keys() Out[1]: dict_keys(['input_ids'

AutoTokenizer.from_pretrained fails to load locally saved pretrained tokenizer (PyTorch)

阅读更多关于 AutoTokenizer.from_pretrained fails to load locally saved pretrained tokenizer (PyTorch)

BERT-based NER model giving inconsistent prediction when deserialized

阅读更多关于 BERT-based NER model giving inconsistent prediction when deserialized

问题 I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions. Code The model is the following: from transformers import BertForTokenClassification model = BertForTokenClassification.from_pretrained( "bert-base-cased", num_labels=NUM_LABELS, output_attentions = False, output_hidden_states = False ) I am using this snippet to save the model on Colab import torch torch.save(model.state_dict(),

BERT-based NER model giving inconsistent prediction when deserialized

阅读更多关于 BERT-based NER model giving inconsistent prediction when deserialized

ImportError: cannot import name 'AutoModelWithLMHead' from 'transformers'

阅读更多关于 ImportError: cannot import name 'AutoModelWithLMHead' from 'transformers'

问题 This is literally all the code that I am trying to run: from transformers import AutoModelWithLMHead, AutoTokenizer import torch tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small") model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small") I am getting this error: --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-14-aad2e7a08a74> in <module> ----> 1 from transformers import

Get probability of multi-token word in MASK position

阅读更多关于 Get probability of multi-token word in MASK position

问题 It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, and then find the probability of your requested token in the output vector. However, this only works with single-token words, e.g. words that are themselves in the tokenizer's vocabulary. When a word does not exist in the vocabulary, the tokenizer will chunk it up into pieces that it does know (see

Get probability of multi-token word in MASK position

阅读更多关于 Get probability of multi-token word in MASK position

BERT sentence embeddings from transformers

阅读更多关于 BERT sentence embeddings from transformers

来源： https://stackoverflow.com/questions/63461262/bert-sentence-embeddings-from-transformers

BERT sentence embeddings from transformers

阅读更多关于 BERT sentence embeddings from transformers

来源： https://stackoverflow.com/questions/63461262/bert-sentence-embeddings-from-transformers