问题
I have a resource where i know exactly the types of words. i have to lemmatize them but for correct results, i have to manually tag them. i could not find any code for manual tagging of words. i m using following code but it returns wrong result. i.e "painting" for "painting" where i expect "paint".
*//...........lemmatization starts........................
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";
Annotation document = pipeline.process(text);
List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);
for(edu.stanford.nlp.util.CoreMap sentence: sentences)
{
for(CoreLabel token: sentence.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
System.out.println("lemmatized version :" + lemma);
}
}
//...........lemmatization ends.........................*
i have to run lemmatizer on words and not sentences where pos tagging will be done automatically. so i would first manually tag the words and then find their lemma. help with some sample code or reference to some site would be great help.
回答1:
If you know the POS tags in advance you can get the lemmata the following way:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";
Morphology morphology = new Morphology();
Annotation document = pipeline.process(text);
List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);
for(edu.stanford.nlp.util.CoreMap sentence: sentences) {
for(CoreLabel token: sentence.get(TokensAnnotation.class)) {
String word = token.get(TextAnnotation.class);
String tag = ... //get the tag for the current word from somewhere, e.g. an array
String lemma = morphology.lemma(word, tag);
System.out.println("lemmatized version :" + lemma);
}
}
In case you only want to get the lemma of a single word, you don't even have to run CoreNLP for tokenizing and sentence-splitting, so you could just call the lemma function as following:
String tag = "VBG";
String word = "painting";
Morphology morphology = new Morphology();
String lemma = morphology.lemma(word, tag);
来源:https://stackoverflow.com/questions/28724111/manual-tagging-of-words-using-stanford-cornlp