How to create Custom model using OpenNLP?

别来无恙 提交于 2019-12-03 17:16:55

It sounds like you're not happy with the performance of the pre-built name model for OpenNLP. But (a) models are never perfect, and even the best model will miss some things it should have caught and catch some things it should have missed; and (b) the model will perform best if the documents the model was trained on match the documents you're trying to tag, in genre and text style (so a model trained on mixed case text won't work very well on all-caps text, and a model trained on news articles won't work well on, say, tweets). You can try other publicly available tools, like the Stanford NE toolkit, or LingPipe; they may have better-performing models. But none of them are going to be perfect.

To create a custom model, you'll need to produce some training data. For OpenNLP, it would look something like

I have a Ph.D. in <START:skill> operations research <END>

For something as specific as this, you'd probably need to come up with that data yourself. And you'll need a lot of it; the OpenNLP documentation recommends about 15,000 example sentences. Consult the OpenNLP docs for more details.

markgiaconia

this post might help

OpenNLP: foreign names does not get recognized

It shows how to generate a model using a very new OpenNLP addon called "modelbuilder-addon"

you feed it a file of sentences, a file of known names, and tell it where to put the model. HTH

One way you could do this would be to keep a list of known proper names, that can appear in documents. This would be also a good method for skills. When you recognize a named entity, you should check wether it appears on the list.

The other way would be to write your own component that extracts named entities which does a better job than OpenNLP, but it is probably much more difficult.

rishi

I have heard people having good success with Apache UIMA for NER. There was a discussion about this just a day back here: how to use Entity Recognition with Apache solr and LingPipe or similar tools

It has few links you might want to have a look at.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!