Is 100 training examples sufficient for training custom NER using spacy? [closed]

主宰稳场 提交于 2020-01-16 15:39:26

问题


I have trained NER model for names data. I generated some random sentences which contain names of the person. I generated some 70 sentences and annotated the data in spacy's format.

I trained custom NER using both blank 'en' model and 'en_core_web_sm' but when I tested on any string. It is able to detect in very few examples.

Is this number of examples are insufficient?

My data looks like this -:

[("'Hi, I am looking for a house on rent for a year. Best Regards, Rajesh',\r",
  {'entities': [(56, 63, 'name')]}),
 ("'Hello everyone, I am Gunjan Arora',\r", {'entities': [(22, 34, 'name')]}),
 ("'Greetings!, I am 34 years old. I want a car for my wife Bella Roy',\r",
  {'entities': [(60, 69, 'name')]}),
 ("'Heyo, I lived with my family comprises 4 people and myself Randy Lao',\r",
  {'entities': [(60, 69, 'name')]}),
 ("'I am Geetanjali. ',\r", {'entities': [(6, 16, 'name')]})]

I have generated some 70 examples like this.

Losses during training -:

 - 1.Losses {'ner': 6.307317615201415} 
 - 2.Losses {'ner': 11.182436657139132}
 - 3.Losses {'ner': 6.014345924849759}
 - 4.Losses {'ner': 6.442589285506237}
 - 5.Losses {'ner': 5.328383899880891}
 - 6.Losses {'ner': 1.706726450400089}
 - 7.Losses {'ner': 3.9960324752880005}
 - 8.Losses {'ner': 5.415169572852782}

These losses when I am using blank 'en' model

Please suggest.

I wanted to detect names as the pre-trained model itself is not able to detect names in most of the cases as well.


回答1:


for a better result, you will need to generate more examples, 70 examples is not Ok to train your model although it may work on a non-sophisticated problem. I would suggest to triple your generated examples for a good fit



来源:https://stackoverflow.com/questions/56330196/is-100-training-examples-sufficient-for-training-custom-ner-using-spacy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!