bert-language-model

How should properly formatted data for NER in BERT look like?

别来无恙 提交于 2020-08-09 08:57:28
问题 I am using Huggingface's transformers library and want to perform NER using BERT. I tried to find an explicit example of how to properly format the data for NER using BERT. It is not entirely clear to me from the paper and the comments I've found. Let's say we have a following sentence and labels: sent = "John Johanson lives in Ramat Gan." labels = ['B-PER', 'I-PER', 'O', 'O', 'B-LOC', 'I-LOC'] Would data that we input to the model be something like this: sent = ['[CLS]', 'john', 'johan', '#

Spacy's BERT model doesn't learn

≯℡__Kan透↙ 提交于 2020-08-03 09:25:32
问题 I've been trying to use spaCy 's pretrained BERT model de_trf_bertbasecased_lg to increase accuracy in my classification project. I used to build a model from scratch using de_core_news_sm and everything worked fine: I had an accuracy around 70%. But now I am using BERT pretrained model instead and I'm getting 0% accuracy. I don't believe that it's working so bad, so I'm assuming that there is just a problem with my code. I might have missed something important but I can't figure out what. I

Spacy's BERT model doesn't learn

孤人 提交于 2020-08-03 09:23:47
问题 I've been trying to use spaCy 's pretrained BERT model de_trf_bertbasecased_lg to increase accuracy in my classification project. I used to build a model from scratch using de_core_news_sm and everything worked fine: I had an accuracy around 70%. But now I am using BERT pretrained model instead and I'm getting 0% accuracy. I don't believe that it's working so bad, so I'm assuming that there is just a problem with my code. I might have missed something important but I can't figure out what. I

BERT embedding for semantic similarity

不羁岁月 提交于 2020-05-14 18:10:22
问题 I earlier posted this question. I wanted to get embedding similar to this youtube video, time 33 minutes onward. 1) I dont think that the embedding that i am getting from CLS token are similar to what is shown in the youtube video. I tried to perform semantic similarity and got horrible results. Could someone confirm whether embedding that i am getting are similar to embedding mentioned at 35.27 mark of the video? 2) If the answer of the above question is 'not similar' then how could i get