Stanford NLP named entities of more than one token

我与影子孤独终老i 提交于 2019-12-01 09:29:13

You should use the "entitymentions" annotator, which will mark continuous sequences of tokens with the same ner tag as an entity. The list of entities for each sentence will be stored under the CoreAnnotations.MentionsAnnotation.class key. Each entity mention itself will be a CoreMap.

Looking over this code could help:

https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/EntityMentionsAnnotator.java

some sample code:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;



public class EntityMentionsExample {

  public static void main (String[] args) throws IOException {
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    String text = "Joe Smith is from Florida.";
    Annotation annotation = new Annotation(text);
    pipeline.annotate(annotation);
    System.out.println("---");
    System.out.println("text: " + text);
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
        System.out.print(entityMention.get(CoreAnnotations.TextAnnotation.class));
        System.out.print("\t");
        System.out.print(
                entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
        System.out.println();
      }
    }
  }
}
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!