opennlp | 易学教程

OpenNLP: Training a custom NER Model for multiple entities

阅读更多关于 OpenNLP: Training a custom NER Model for multiple entities

问题 I am trying training a custom NER model for multiple entities. Here is the sample training data: count all <START:item_type> operating tables <END> on the <START:location_id> third <END> <START:location_type> floor <END> count all <START:item_type> items <END> on the <START:location_id> third <END> <START:location_type> floor <END> how many <START:item_type> beds <END> are in <START:location_type> room <END> <START:location_id> 2 <END> The NameFinderME.train(.) method takes a string parameter

NoClassDefFoundError: opennlp/tools/chunker/ChunkerModel

阅读更多关于 NoClassDefFoundError: opennlp/tools/chunker/ChunkerModel

问题 Got this error while trying opennlp chunking: NoClassDefFoundError: opennlp/tools/chunker/ChunkerModel Here is the basic code: import java.io.*; import opennlp.tools.chunker.*; public class test{ public static void main(String[] args) throws IOException{ ChunkerModel model = null; InputStream modelIn = new FileInputStream("en-parser-chunking.bin"); model = new ChunkerModel(modelIn); } } 回答1: I don't see any NLP-specific reasons here, so just check tutorials about NoClassDefFoundError, for

opennlp sample training data for disease

阅读更多关于 opennlp sample training data for disease

问题 I'm using OpenNLP for data classification. I could not find TokenNameFinderModel for disease here. I know I can create my own model but I was wondering is there any large sample training data available for disease? 回答1: You can easily create your own training data-set using the modelbuilder addon and follow some rules as mentioned here to train create a good NER model. you can find some help using modelbuilder addon here. It is basically, you put all the information in a text file and the NER

Apache OpenNLP: java.io.FileInputStream cannot be cast to opennlp.tools.util.InputStreamFactory

阅读更多关于 Apache OpenNLP: java.io.FileInputStream cannot be cast to opennlp.tools.util.InputStreamFactory

问题 I am trying to build a custom NER using Apache OpenNLP 1.7. From the documentation available Here, I have developed the following code import java.io.BufferedOutputStream; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.nio.charset.Charset; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.NameSample; import opennlp.tools.namefind.NameSampleDataStream; import opennlp.tools.namefind.TokenNameFinderFactory; import

Why is a self trained NER-Model incompatible with the version of OpenNLP?

阅读更多关于 Why is a self trained NER-Model incompatible with the version of OpenNLP?

问题 I trained OpenNLP NER-Model to detect a new Entity but when I am using this model I encountered the following Exception: Exception in thread "main" java.lang.IllegalArgumentException: opennlp.tools.util.InvalidFormatException: Model version 1.6.0 is not supported by this (1.5.3) version of OpenNLP! I am using OpenNLP version 1.6.0 and my source code is this: String [] sentences = Fragmentation.getSentences(Document); InputStream modelIn = new FileInputStream("Models/en-ner-cvskill.bin");

How to get the annotated text for a DictionaryAnnotator

阅读更多关于 How to get the annotated text for a DictionaryAnnotator

问题 I have a dictionary created from the DictionaryCreator from UIMA, I would like to annotate a piece of text using the DictionaryAnnotator and the aforementioned dictionary, I could not figure out how to get the annotated text. Please let me know if you do. Any help is appreciated. The code, the dictionary-file and the descriptor is mentioned below, P.S. I'm new to Apache UIMA. XMLInputSource xml_in = new XMLInputSource("DictionaryAnnotatorDescriptor.xml"); ResourceSpecifier specifier =

How to extract sentences containing specific person names using R

阅读更多关于 How to extract sentences containing specific person names using R

问题 I am using R to extract sentences containing specific person names from texts and here is a sample paragraph: Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin. Melanchthon became professor of the Greek language in Wittenberg at the age of 21. He studied the Scripture, especially of Paul, and Evangelical doctrine. He was present at the disputation of Leipzig (1519) as a spectator, but

Training own model in opennlp

阅读更多关于 Training own model in opennlp

问题 I am finding it difficult to create my own model openNLP. Can any one tell me, how to own model. How the training shouls be done. What should be the input and where the output model file will get stored. 回答1: https://opennlp.apache.org/docs/1.5.3/manual/opennlp.html This website is very useful, shows both in code, and using the OpenNLP application to train models for all different types, like entity extraction and part of speech etc. I could give you some code examples in here, but the page

OpenNLP Name Entity Recognizer output

阅读更多关于 OpenNLP Name Entity Recognizer output

问题 I have trained an OpenNLP Name Entity Recognizer. When I use it over some data it gives an output like: [0..1) location I rather want to output the original name that occurred in the data. 回答1: this is a Span objects toString() output. Each call to find(String[]) can return multiple Spans, hence the find() method returns Span[]. Use this code to get the actual named entities //"tokens" here is the String[] of words in your sentence Span[] find = nf.find(tokens); //use the Span's static method

how to use opennlp on eclipse

阅读更多关于 how to use opennlp on eclipse

问题 I am trying to install opennlp so i can use it for my nlp course project. I have eclipse kepler on my windows 8 computer i read so many online pages about how to install it but no luck I read http://sharpnlp.codeplex.com/discussions/263620 http://sharpnlp.codeplex.com/discussions/263620 and many other links that the website won't allow me to add but non of them seems to help me what i did is the following: Microsoft Windows [Version 6.2.9200] (c) 2012 Microsoft Corporation. All rights