Why does Stanford CoreNLP NER-annotator load 3 models by default?

∥☆過路亽.° 提交于 2019-12-09 07:09:38

问题


When I add the "ner" annotator to my StanfordCoreNLP object pipeline, I can see that it loads 3 models, which takes a lot of time:

Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [10.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [6.5 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt

Is there a way to just load a subset that will work equally? Particularly, I am unsure why it is loading the 3-class and 4-class NER models when it has the 7-class model, and I'm wondering if not loading these two will still work.


回答1:


You can set which models are loaded in this manner:

command line:

-ner.model model_path1,model_path2

Java code:

 props.put("ner.model", "model_path1,model_path2");

Where model_path1 and model_path2 should be something like:

"edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"

The models are applied in layers. The first model is run and its tags applied. Then the second, the third, and so on. If you want less models, you can put 1 or 2 models in the list instead of the three default, but this will change how the system performs.

If you set "ner.combinationMode" to "HIGH_RECALL", all models will be allowed to apply all of their tags. If you set "ner.combinationMode" to "NORMAL", then a future model cannot apply any tags set by previous models.

All three models in the default were trained on different data. For instance, the 3-class was trained with substantially more data than the 7-class model. So each model is doing something different and their results are all being combined to create the final tag sequence.



来源:https://stackoverflow.com/questions/33905412/why-does-stanford-corenlp-ner-annotator-load-3-models-by-default

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!