Writing our own models in openNLP

坚强是说给别人听的谎言 提交于 2019-12-01 13:00:33

You'll need to train your own model by annotating some sentences in the opennlp format. For the example sentences you posted the format would look like this:

what is the risk value on <START:product> icm2500 <END>.
Delivery of <START:product> prd_234 <END> will be arrived late.
Watson is handling <START:product> router_34 <END>.

Make sure each sentence ends in a newline and if there are newlines in the sentence to escape them somehow. Once you make a file like this out of your data, then you can use the Java API to train the model like this

public static void main(String[] args){

Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
        new PlainTextByLineStream(new FileInputStream("your file in the above format"), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);

TokenNameFinderModel model;

try {
  model = NameFinderME.train("en", "person", sampleStream, TrainingParameters.defaultParams(),
            null, Collections.<String, Object>emptyMap());
}
finally {
  sampleStream.close();
}

try {
  modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
  model.serialize(modelOut);
} finally {
  if (modelOut != null) 
     modelOut.close();      
}

}

now you can use the model with the namefinder.

Because you may have a definitive, and possibly short, list of product names, you might consider a simple regex approach.

here's the opennlp docs that cover the NameFinder a bit:

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training.tool
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!