Stanford CoreNLP pipeline coref: parsing some short strings (with few mentions) returns indexoutofbounds exception

倾然丶 夕夏残阳落幕 提交于 2020-01-06 01:54:39

问题


BACKGROUND: I'm importing the Stanford CoreNLP library into my clojure project. I was using version 3.5.1 but recently jumped directly into version 3.6.0, bypassing 3.5.2. As part of this update, because I was getting coreference information using the dcoref annotator, I needed to make small modifications so that my program used the coref annotator instead.

In the past (v3.5.1), when I created a pipeline with the following annotators

"tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref, quote, entitymentions",

I could parse a sentence such as the following without error:

"I ate bread".

If I remember correctly, extracting the coreference chains from the resulting annotated document would just return an null value, or maybe an empty array. But that's inconsequential, because at least the annotated document would be created without error.

Now, when I create a pipeline with the following annotators:

"tokenize, ssplit, pos, lemma, ner, parse, depparse, mention, coref, quote, entitymentions",

and then I try to parse that same sentence (or any other sentences with only 1 or 0 "mentions") I get an indexoutofboundsexception with the following trace:

actual: java.lang.RuntimeException: Error annotating document with coref
 at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:79)
    edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:62)
    edu.stanford.nlp.pipeline.CorefAnnotator.annotate (CorefAnnotator.java:100)
    edu.stanford.nlp.pipeline.AnnotationPipeline.annotate (AnnotationPipeline.java:68)
    edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate (StanfordCoreNLP.java:491)
    nlp.core$parse_text.invoke (core.clj:199)
    nlp.focus_scorer.process$lexchain_features.invoke (process.clj:63)
    nlp.focus_scorer.process_test/fn (process_test.clj:49)
    clojure.test$test_var$fn__7670.invoke (test.clj:704)
    clojure.test$test_var.invoke (test.clj:704)
    clojure.test$test_vars$fn__7692$fn__7697.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars$fn__7692.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars.invoke (test.clj:718)
    clojure.test$test_all_vars.invoke (test.clj:728)
    clojure.test$test_ns.invoke (test.clj:747)
    clojure.core$map$fn__4553.invoke (core.clj:2624)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.Cons.next (Cons.java:39)
    clojure.lang.RT.boundedLength (RT.java:1735)
    clojure.lang.RestFn.applyTo (RestFn.java:130)
    clojure.core$apply.invoke (core.clj:632)
    clojure.test$run_tests.doInvoke (test.clj:762)
    clojure.lang.RestFn.invoke (RestFn.java:408)
    user$eval13163.invoke (form-init7737210093072696705.clj:1)
    clojure.lang.Compiler.eval (Compiler.java:6782)
    clojure.lang.Compiler.eval (Compiler.java:6745)
    clojure.core$eval.invoke (core.clj:3081)
    clojure.main$repl$read_eval_print__7099$fn__7102.invoke (main.clj:240)
    clojure.main$repl$read_eval_print__7099.invoke (main.clj:240)
    clojure.main$repl$fn__7108.invoke (main.clj:258)
    clojure.main$repl.doInvoke (main.clj:258)
    clojure.lang.RestFn.invoke (RestFn.java:1523)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__909.invoke (interruptible_eval.clj:58)
    clojure.lang.AFn.applyToHelper (AFn.java:152)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invoke (core.clj:630)
    clojure.core$with_bindings_STAR_.doInvoke (core.clj:1868)
    clojure.lang.RestFn.invoke (RestFn.java:425)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:56)
    clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__951$fn__954.invoke (interruptible_eval.clj:191)
    clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__946.invoke (interruptible_eval.clj:159)
    clojure.lang.AFn.run (AFn.java:22)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList$SubList.rangeCheck (ArrayList.java:1217)
    java.util.ArrayList$SubList.get (ArrayList.java:1034)
    edu.stanford.nlp.scoref.Clusterer$State.setClusters (Clusterer.java:349)
    edu.stanford.nlp.scoref.Clusterer$State.<init> (Clusterer.java:322)
    edu.stanford.nlp.scoref.Clusterer.getClusterMerges (Clusterer.java:58)
    edu.stanford.nlp.scoref.ClusteringCorefSystem.runCoref (ClusteringCorefSystem.java:63)
    edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:68)
    edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:62)
    edu.stanford.nlp.pipeline.CorefAnnotator.annotate (CorefAnnotator.java:100)
    edu.stanford.nlp.pipeline.AnnotationPipeline.annotate (AnnotationPipeline.java:68)
    edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate (StanfordCoreNLP.java:491)
    nlp.core$parse_text.invoke (core.clj:199)
    nlp.focus_scorer.process$lexchain_features.invoke (process.clj:63)
    nlp.focus_scorer.process_test/fn (process_test.clj:49)
    clojure.test$test_var$fn__7670.invoke (test.clj:704)
    clojure.test$test_var.invoke (test.clj:704)
    clojure.test$test_vars$fn__7692$fn__7697.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars$fn__7692.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars.invoke (test.clj:718)
    clojure.test$test_all_vars.invoke (test.clj:728)
    clojure.test$test_ns.invoke (test.clj:747)
clojure.core$map$fn__4553.invoke (core.clj:2624)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.Cons.next (Cons.java:39)
    clojure.lang.RT.boundedLength (RT.java:1735)
    clojure.lang.RestFn.applyTo (RestFn.java:130)
    clojure.core$apply.invoke (core.clj:632)
    clojure.test$run_tests.doInvoke (test.clj:762)
    clojure.lang.RestFn.invoke (RestFn.java:408)
    user$eval13163.invoke (form-init7737210093072696705.clj:1)
    clojure.lang.Compiler.eval (Compiler.java:6782)
    clojure.lang.Compiler.eval (Compiler.java:6745)
    clojure.core$eval.invoke (core.clj:3081)
    clojure.main$repl$read_eval_print__7099$fn__7102.invoke (main.clj:240)
    clojure.main$repl$read_eval_print__7099.invoke (main.clj:240)
    clojure.main$repl$fn__7108.invoke (main.clj:258)
    clojure.main$repl.doInvoke (main.clj:258)
    clojure.lang.RestFn.invoke (RestFn.java:1523)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__909.invoke (interruptible_eval.clj:58)
    clojure.lang.AFn.applyToHelper (AFn.java:152)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invoke (core.clj:630)
    clojure.core$with_bindings_STAR_.doInvoke (core.clj:1868)
    clojure.lang.RestFn.invoke (RestFn.java:425)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:56)
clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__951$fn__954.invoke (interruptible_eval.clj:191)
    clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__946.invoke (interruptible_eval.clj:159)
    clojure.lang.AFn.run (AFn.java:22)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)

Am I possibly doing something wrong? I realize that the fact that I'm using clojure instead of java might be causing some issue, but I've never had a problem with version 3.5.1. It would seem that the error is being thrown from the annotation step in edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate, but I'm not sure what I can do about that (other than to have two pipeline objects, one with the coref annotator and one without, parse the sentence without coref, count the mentions, and then parse with coref only if I see more than one mention... which seems a little too much.)


回答1:


3.6.0 features major changes to coreference. This issue is a bug in Stanford CoreNLP 3.6.0. If you re-download the distribution this bug should be fixed in what's up on the site now. It should also be fixed in the up-coming Maven release.



来源:https://stackoverflow.com/questions/34902540/stanford-corenlp-pipeline-coref-parsing-some-short-strings-with-few-mentions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!