How to extract Named Entity + Verb from text

时光怂恿深爱的人放手 提交于 2019-12-05 14:20:09

Here is some sample code to help with your problem:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.util.*;



public class NERAndVerbExample {

  public static void main(String[] args) throws IOException {
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    String text = "John Smith went to the store.";
    Annotation annotation = new Annotation(text);
    pipeline.annotate(annotation);
    System.out.println("---");
    System.out.println("text: " + text);
    System.out.println("");
    System.out.println("dependency edges:");
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
      for (SemanticGraphEdge sge : sg.edgeListSorted()) {
        System.out.println(
                sge.getGovernor().word() + "," + sge.getGovernor().index() + "," + sge.getGovernor().tag() + "," +
                        sge.getGovernor().ner()
                        + " - " + sge.getRelation().getLongName()
                        + " -> "
                        + sge.getDependent().word() + "," +
                        +sge.getDependent().index() + "," + sge.getDependent().tag() + "," + sge.getDependent().ner());
      }
      System.out.println();
      System.out.println("entity mentions:");
      for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
        int lastTokenIndex = entityMention.get(CoreAnnotations.TokensAnnotation.class).size()-1;
        System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class) +
                "\t" +
                entityMention.get(CoreAnnotations.TokensAnnotation.class)
                        .get(lastTokenIndex).get(CoreAnnotations.IndexAnnotation.class) + "\t" +
                entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
      }
    }
  }
}

I'm hoping to add some syntactic sugar to Stanford CoreNLP 3.8.0 to assist with working with the entity mentions.

To explain this code a bit, basically the entitymentions annotator goes through and groups tokens with the same NER tag together. So "John Smith" gets marked as an entity mention.

If you go through the dependency graph, you can get the index of each word.

Likewise if you access the list of tokens for an entity mention, you can also find the index of each word for the entity mention.

With a little more code you can link those together and form entity mention verb pairs as you were requesting.

As you can see in the current code it is quite cumbersome to access info for an entity mention, so I am going to try to improve that in 3.8.0.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!