Scala - spark-corenlp - java.lang.ClassNotFoundException

人走茶凉 提交于 2019-12-23 01:53:07

问题


I want to run spark-coreNLP example, but I get an java.lang.ClassNotFoundException error when running spark-submit.

Here is the scala code, from the github example, which I put into an object, and defined a SparkContext.

analyzer.Sentiment.scala:

package analyzer
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._
import sqlContext.implicits._

object Sentiment {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("Sentiment")
    val sc = new SparkContext(conf)

        val input = Seq(
                (1, "<xml>Stanford University is located in California. It is a great university.</xml>")
                ).toDF("id", "text")

        val output = input
            .select(cleanxml('text).as('doc))
            .select(explode(ssplit('doc)).as('sen))
            .select('sen, tokenize('sen).as('words), ner('sen).as('nerTags), sentiment('sen).as('sentiment))

            output.show(truncate = false)
    }
}

I am using the build.sbt provided by spark-coreNLP - I only modified the scalaVersion and sparkVerison to my own.

version := "1.0"

scalaVersion := "2.11.8"

initialize := {
  val _ = initialize.value
  val required = VersionNumber("1.8")
  val current = VersionNumber(sys.props("java.specification.version"))
  assert(VersionNumber.Strict.isCompatible(current, required), s"Java $required required.")
}

sparkVersion := "1.5.2"

// change the value below to change the directory where your zip artifact will be created
spDistDirectory := target.value

sparkComponents += "mllib"

spName := "databricks/spark-corenlp"

licenses := Seq("GPL-3.0" -> url("http://opensource.org/licenses/GPL-3.0"))

resolvers += Resolver.mavenLocal

libraryDependencies ++= Seq(
  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
  "com.google.protobuf" % "protobuf-java" % "2.6.1"
)

Then, I created my jar by running without issues.

sbt package

Finally, I submit my job to Spark:

spark-submit --class "analyzer.Sentiment" --master local[4] target/scala-2.11/sentimentanalizer_2.11-0.1-SNAPSHOT.jar 

But I get the following error:

java.lang.ClassNotFoundException: analyzer.Sentiment
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:641)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

My file Sentiment.scala is correclty located in a package named "analyzer".

    $ find .
    ./src
    ./src/analyzer
    ./src/analyzer/Sentiment.scala
    ./src/com
    ./src/com/databricks
    ./src/com/databricks/spark
    ./src/com/databricks/spark/corenlp
    ./src/com/databricks/spark/corenlp/CoreNLP.scala
    ./src/com/databricks/spark/corenlp/functions.scala
    ./src/com/databricks/spark/corenlp/StanfordCoreNLPWrapper.scala

When I ran the SimpleApp example from the Spark Quick Start , I noticed that MySimpleProject/bin/ contained a SimpleApp.class. MySentimentProject/bin is empty. So I have tried to clean my project (I am using Eclipse for Scala).

I think it is because I need to generate Sentiment.class, but I don't know how to do it - It was done automatically with SimpleApp.scala, and when it ry to run/build with Eclipse Scala, it crashes.


回答1:


Maybe You should try to add

scalaSource in Compile := baseDirectory.value / "src"

to your build.sbt, cause sbt document reads that "the directory that contains the main Scala sources is by default src/main/scala".

Or just make your source code in this structure

$ find .
./src
./src/main
./src/main/scala
./src/main/scala/analyzer
./src/main/scala/analyzer/Sentiment.scala
./src/main/scala/com
./src/main/scala/com/databricks
./src/main/scala/com/databricks/spark
./src/main/scala/com/databricks/spark/corenlp
./src/main/scala/com/databricks/spark/corenlp/CoreNLP.scala
./src/main/scala/com/databricks/spark/corenlp/functions.scala
./src/main/scala/com/databricks/spark/corenlp/StanfordCoreNLPWrapper.scala


来源:https://stackoverflow.com/questions/37978991/scala-spark-corenlp-java-lang-classnotfoundexception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!