Manual tagging of Words using Stanford CorNLP

妖精的绣舞 提交于 2019-12-12 10:23:31

问题


I have a resource where i know exactly the types of words. i have to lemmatize them but for correct results, i have to manually tag them. i could not find any code for manual tagging of words. i m using following code but it returns wrong result. i.e "painting" for "painting" where i expect "paint".

*//...........lemmatization starts........................

Properties props = new Properties(); 
props.put("annotators", "tokenize, ssplit, pos, lemma"); 
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting"; 
Annotation document = pipeline.process(text);  

List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

for(edu.stanford.nlp.util.CoreMap sentence: sentences) 

{    
    for(CoreLabel token: sentence.get(TokensAnnotation.class))
    {       
        String word = token.get(TextAnnotation.class);      
        String lemma = token.get(LemmaAnnotation.class); 
        System.out.println("lemmatized version :" + lemma);
    }
}

//...........lemmatization ends.........................*

i have to run lemmatizer on words and not sentences where pos tagging will be done automatically. so i would first manually tag the words and then find their lemma. help with some sample code or reference to some site would be great help.


回答1:


If you know the POS tags in advance you can get the lemmata the following way:

Properties props = new Properties(); 
props.put("annotators", "tokenize, ssplit"); 
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";

Morphology morphology = new Morphology();

Annotation document = pipeline.process(text);  

List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

for(edu.stanford.nlp.util.CoreMap sentence: sentences) {

  for(CoreLabel token: sentence.get(TokensAnnotation.class)) {       
    String word = token.get(TextAnnotation.class);
    String tag = ... //get the tag for the current word from somewhere, e.g. an array
    String lemma = morphology.lemma(word, tag);
    System.out.println("lemmatized version :" + lemma);
  }
}

In case you only want to get the lemma of a single word, you don't even have to run CoreNLP for tokenizing and sentence-splitting, so you could just call the lemma function as following:

String tag = "VBG";      
String word = "painting";
Morphology morphology = new Morphology();
String lemma = morphology.lemma(word, tag);


来源:https://stackoverflow.com/questions/28724111/manual-tagging-of-words-using-stanford-cornlp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!