CoreNLP on Apache Spark

谁都会走 提交于 2019-12-25 06:14:45

问题


I'm not sure if this is related to Spark or NLP. Please help.I'm currently trying to run Stanford CoreNLP Library on Apache Spark and when I try to run it on multiple cores, I get the following exception. I'm using the latest NLP Library which is thread safe.

This is happening during the map phase on line.

 pipeline.annotate(document);

java.util.ConcurrentModificationException

at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
    at java.util.ArrayList$Itr.next(ArrayList.java:851)
    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:463)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.<init>(GrammaticalStructure.java:201)
    at edu.stanford.nlp.trees.EnglishGrammaticalStructure.<init>(EnglishGrammaticalStructure.java:89)
    at edu.stanford.nlp.semgraph.SemanticGraphFactory.makeFromTree(SemanticGraphFactory.java:139)
    at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.annotate(DeterministicCorefAnnotator.java:89)
    at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:68)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:412)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.process(StanfordCoreNLP.java:441)
    at sampleApp.WordProcessor$2.call(WordProcessor.java:69)
    at sampleApp.WordProcessor$2.call(WordProcessor.java:1)

回答1:


I think it is a CoreNLP issue.

See also Concurrent processing using Stanford CoreNLP (3.5.2).

I had the same problem and using a build from the latest github revision (today) solved the problem. In summary think there was a bug in CoreNLP 3.5.2 and they solved it.




回答2:


While its a bit hard to tell from that small amount of code, I think the key is the line java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042). Most likely you are trying to modify something which doesn't support modification, the solution to which would be to make a copy of your input.



来源:https://stackoverflow.com/questions/30677321/corenlp-on-apache-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!