CoreNLP Stanford Dependency Format

问题

Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas

From the above sentence, I am looking to obtain the following typed dependencies:

nsubjpass(submitted, Bills)
auxpass(submitted, were)
agent(submitted, Brownback)
nn(Brownback, Senator)
appos(Brownback, Republican)
prep_of(Republican, Kansas)
prep_on(Bills, ports)
conj_and(ports, immigration)
prep_on(Bills, immigration)

This should be possible as per Table 1, Figure 1 on the documentation for Stanford Dependencies.

Using the below code I have only been able to achieve the following dependency makeup (code outputs this):

root(ROOT-0, submitted-7)
nmod:on(Bills-1, ports-3)
nmod:on(Bills-1, immigration-5)
case(ports-3, on-2)
cc(ports-3, and-4)
conj:and(ports-3, immigration-5)
nsubjpass(submitted-7, Bills-1)
auxpass(submitted-7, were-6)
nmod:agent(submitted-7, Brownback-10)
case(Brownback-10, by-8)
compound(Brownback-10, Senator-9)
punct(Brownback-10, ,-11)
appos(Brownback-10, Republican-12)
nmod:of(Republican-12, Kansas-14)
case(Kansas-14, of-13)

Question - How do I achieve the desired output above?

Code

public void processTestCoreNLP() {
    String text = "Bills on ports and immigration were submitted " +
            "by Senator Brownback, Republican of Kansas";

    Annotation annotation = new Annotation(text);
    Properties properties = PropertiesUtils.asProperties(
            "annotators", "tokenize,ssplit,pos,lemma,depparse"
    );

    AnnotationPipeline pipeline = new StanfordCoreNLP(properties);

    pipeline.annotate(annotation);

    for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) {
        SemanticGraph sg = sentence.get(EnhancedPlusPlusDependenciesAnnotation.class);
        Collection<TypedDependency> dependencies = sg.typedDependencies();
        for (TypedDependency td : dependencies) {
            System.out.println(td);
        }
    }
}

回答1:

If you want to get the CCprocessed and collapsed Stanford Dependencies (SD) for a sentence through the NN dependency parser, you'll have to set a property to circumvent a small bug in CoreNLP.

However, please note that we are no longer maintaining the Stanford Dependencies code and unless you have really good reasons to use SD, we'd recommend using Universal Dependencies for any new projects. Take a look at the Universal Dependencies (UD) documentation and Schuster and Manning (2016) for more information on the UD representation.

To obtain the CCprocessed and collapsed SD representation, set the depparse.language property as follows:

public void processTestCoreNLP() {
  String text = "Bills on ports and immigration were submitted " +
        "by Senator Brownback, Republican of Kansas";

  Annotation annotation = new Annotation(text);
  Properties properties = PropertiesUtils.asProperties(
        "annotators", "tokenize,ssplit,pos,lemma,depparse");

  properties.setProperty("depparse.language", "English")

  AnnotationPipeline pipeline = new StanfordCoreNLP(properties);

  pipeline.annotate(annotation);

  for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) {
    SemanticGraph sg = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    Collection<TypedDependency> dependencies = sg.typedDependencies();
    for (TypedDependency td : dependencies) {
      System.out.println(td);
    }
  }
}

回答2:

CoreNLP recently switched from the old Stanford dependencies format (the format in the top example) to Universal Dependencies. My first recommendation is to use the new format if at all possible. Continuing development on the parsers will be using universal dependencies, and the format is in many ways similar to the old format, modulo cosmetic changes (e.g., prep -> nmod).

However, if you'd like to get the old dependency format out, you can do so with the CollapsedCCProcessedDependenciesAnnotation annotation.

来源：https://stackoverflow.com/questions/45202486/corenlp-stanford-dependency-format

标签

nlp

stanford-nlp