java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition

房东的猫 提交于 2020-05-17 07:01:25

问题


I've been working with Cassandra for a little while and now I'm trying to setup spark and spark-cassandra-connector. I'm using IntelliJ IDEA to do that (first time with IntelliJ IDEA and Scala too) in Windows 10.

build.gradle

apply plugin: 'scala'
apply plugin: 'idea'
apply plugin: 'eclipse'

repositories {
    mavenCentral()

    flatDir {
        dirs 'runtime libs'
    }
}

idea {
    project {
        jdkName = '1.8'
        languageLevel = '1.8'
    }
}

dependencies {
    compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.5'
    compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.5'
    compile group: 'org.scala-lang', name: 'scala-library', version: '2.11.12'
    compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.5.0'
    compile group: 'log4j', name: 'log4j', version: '1.2.17'
}

configurations.all {
    resolutionStrategy {
        force 'com.google.guava:guava:12.0.1'
    }
}

compileScala.targetCompatibility = "1.8"
compileScala.sourceCompatibility = "1.8"

jar {
    zip64 true
    archiveName = "ModuleName.jar"
    from {
        configurations.compile.collect {
            it.isDirectory() ? it : zipTree(it)
        }
    }
    manifest {
        attributes 'Main-Class': 'org.module.ModuelName'
    }
    exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'

}

ModuleName.scala

package org.module
import org.apache.spark.sql.SparkSession
import com.datastax.spark.connector._
import org.apache.spark.sql.types.TimestampType

object SentinelSparkModule {

  case class Document(id: Int, time: TimestampType, data: String)

  def main(args: Array[String]) {
    val spark = SparkSession.builder
      .master("spark://192.168.0.3:7077")
      .appName("App")
      .config("spark.cassandra.connection.host", "127.0.0.1")
      .config("spark.cassandra.connection.port", "9042")
      .getOrCreate()

    //I'm trying it without [Document] since it throws 'Failed to map constructor parameter id in
    //org.module.ModuleName.Document to a column of keyspace.table'

    val documentRDD = spark.sparkContext
      .cassandraTable/*[Document]*/("keyspace", "table")
      .select()
    documentRDD.take(10).foreach(println)
    spark.stop()
 }
}

I have a running spark master at spark://192.168.0.3:7077 and a worker of that master, but I haven't tried to submit the job as a compiled jar in the console, I'm just trying to get it to work in the IDE.

Thanks


回答1:


Cassandra connector jar needs to be added to the classpath of workers. One way to do this is to build an uber jar with all required dependencies and submit to the cluster.

Refer to: Building a uberjar with Gradle

Also, make sure you change the scope of dependencies in you build file from compile to provided for all jars except the cassandra connector.

Reference: https://reflectoring.io/maven-scopes-gradle-configurations/



来源:https://stackoverflow.com/questions/61565049/java-lang-classnotfoundexception-com-datastax-spark-connector-rdd-partitioner-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!