问题
I would like to change the nlp.entity.cfg beam_width (by default it's 1) by 3.
I tried nlp.entity.cfg.update({beam_width : 3}) but it's look like that the nlp thing is broken after this change. (If I do a nlp(str), it will give me a dict instead of a spacy.tokens.doc.Doc like usual if I put beam_width : 1)
I want to change it because the probability of NER will be more accurate in my case (it's my own model that I trained). I did the probas with a code found in github.spacy/issues
with nlp.disable_pipes('ner'):
doc = nlp(txt)
(beams, somethingelse) = nlp.entity.beam_parse([ doc ], beam_width, beam_density)
entity_scores = defaultdict(float)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for start, end, label in ents:
entity_scores[(doc[start:end].text, label, start, end)] += score
beam_width : Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem. (by default : 1)
beam_density : This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective. (by default : 0)
I'm sort a newb to NLP so I don't know what's Beam search with global objective and how to use it, so if you can explain me like I'm 5, it will be great !
I would like to be able to use displacy (style='ent') to visualize the entities with beam_width = 3.
Thanks for you answer, Hervé.
回答1:
(If I do a nlp(str), it will give me a dict instead of a spacy.tokens.doc.Doc like usual if I put beam_width : 1)
I'm not sure why that could be. Are you sure? What version are you using?
I just tried the following:
>>> import spacy
>>> nlp = spacy.load('en_core_web_md')
>>> nlp.entity.cfg['beam_width'] = 3
>>> doc = nlp(u'Hurrican Florence is approaching North Carolina.')
>>> doc.ents
(Hurrican Florence, North Carolina)
>>> nlp.entity.cfg['beam_width'] = 300
>>> doc = nlp(u'Hurrican Florence is approaching North Carolina.')
>>> doc.ents
(Hurrican Florence is approaching, North Carolina.)
As you can see, setting a very wide beam results in bad accuracy, because the default model isn't trained to use a wide beam like that.
As for the ELI5...Well, it's complicated :(. Sorry --- I don't have a simple explanation handy, which is one reason these are undocumented internals.
来源:https://stackoverflow.com/questions/52316842/change-beam-width-in-spacy-ner