huggingface-tokenizers

I want to use “grouped_entities” in the huggingface pipeline for ner task, how to do that?

折月煮酒 提交于 2021-01-29 19:00:56
问题 I want to use "grouped_entities" in the huggingface pipeline for ner task. However having issues doing that. I do look the following link on git but this did not help: https://github.com/huggingface/transformers/pull/4987 回答1: I got the answer its very straight forward in the transformer v4.0.0. Previously I was using older version of transformer package. example: from transformers import AutoTokenizer, AutoModelForTokenClassification,TokenClassificationPipeline from transformers import

Huggingface saving tokenizer

旧巷老猫 提交于 2021-01-28 03:31:18
问题 I am trying to save the tokenizer in huggingface so that I can load it later from a container where I don't need access to the internet. BASE_MODEL = "distilbert-base-multilingual-cased" tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL) tokenizer.save_vocabulary("./models/tokenizer/") tokenizer2 = AutoTokenizer.from_pretrained("./models/tokenizer/") However, the last line is giving the error: OSError: Can't load config for './models/tokenizer3/'. Make sure that: - './models/tokenizer3/'

AutoTokenizer.from_pretrained fails to load locally saved pretrained tokenizer (PyTorch)

◇◆丶佛笑我妖孽 提交于 2020-12-15 09:05:40
问题 I am new to PyTorch and recently, I have been trying to work with Transformers. I am using pretrained tokenizers provided by HuggingFace. I am successful in downloading and running them. But if I try to save them and load again, then some error occurs. If I use AutoTokenizer.from_pretrained to download a tokenizer, then it works. [1]: tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') text = "Hello there" enc = tokenizer.encode_plus(text) enc.keys() Out[1]: dict_keys(['input_ids'

AutoTokenizer.from_pretrained fails to load locally saved pretrained tokenizer (PyTorch)

安稳与你 提交于 2020-12-15 09:04:53
问题 I am new to PyTorch and recently, I have been trying to work with Transformers. I am using pretrained tokenizers provided by HuggingFace. I am successful in downloading and running them. But if I try to save them and load again, then some error occurs. If I use AutoTokenizer.from_pretrained to download a tokenizer, then it works. [1]: tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') text = "Hello there" enc = tokenizer.encode_plus(text) enc.keys() Out[1]: dict_keys(['input_ids'

Hugging-Face Transformers: Loading model from path error

别来无恙 提交于 2020-07-10 10:28:16
问题 I am pretty new to Hugging-Face transformers. I am facing the following issue when I try to load xlm-roberta-base model from a given path: >> tokenizer = AutoTokenizer.from_pretrained(model_path) >> Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 182, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/home/user