问题
I've been working with Cassandra for a little while and now I'm trying to setup spark and spark-cassandra-connector. I'm using IntelliJ IDEA to do that (first time with IntelliJ IDEA and Scala too) in Windows 10.
build.gradle
apply plugin: 'scala'
apply plugin: 'idea'
apply plugin: 'eclipse'
repositories {
mavenCentral()
flatDir {
dirs 'runtime libs'
}
}
idea {
project {
jdkName = '1.8'
languageLevel = '1.8'
}
}
dependencies {
compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.5'
compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.5'
compile group: 'org.scala-lang', name: 'scala-library', version: '2.11.12'
compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.5.0'
compile group: 'log4j', name: 'log4j', version: '1.2.17'
}
configurations.all {
resolutionStrategy {
force 'com.google.guava:guava:12.0.1'
}
}
compileScala.targetCompatibility = "1.8"
compileScala.sourceCompatibility = "1.8"
jar {
zip64 true
archiveName = "ModuleName.jar"
from {
configurations.compile.collect {
it.isDirectory() ? it : zipTree(it)
}
}
manifest {
attributes 'Main-Class': 'org.module.ModuelName'
}
exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'
}
ModuleName.scala
package org.module
import org.apache.spark.sql.SparkSession
import com.datastax.spark.connector._
import org.apache.spark.sql.types.TimestampType
object SentinelSparkModule {
case class Document(id: Int, time: TimestampType, data: String)
def main(args: Array[String]) {
val spark = SparkSession.builder
.master("spark://192.168.0.3:7077")
.appName("App")
.config("spark.cassandra.connection.host", "127.0.0.1")
.config("spark.cassandra.connection.port", "9042")
.getOrCreate()
//I'm trying it without [Document] since it throws 'Failed to map constructor parameter id in
//org.module.ModuleName.Document to a column of keyspace.table'
val documentRDD = spark.sparkContext
.cassandraTable/*[Document]*/("keyspace", "table")
.select()
documentRDD.take(10).foreach(println)
spark.stop()
}
}
I have a running spark master at spark://192.168.0.3:7077 and a worker of that master, but I haven't tried to submit
the job as a compiled jar in the console, I'm just trying to get it to work in the IDE.
Thanks
回答1:
Cassandra connector jar needs to be added to the classpath of workers. One way to do this is to build an uber jar with all required dependencies and submit to the cluster.
Refer to: Building a uberjar with Gradle
Also, make sure you change the scope
of dependencies in you build file from compile
to provided
for all jars except the cassandra connector.
Reference: https://reflectoring.io/maven-scopes-gradle-configurations/
来源:https://stackoverflow.com/questions/61565049/java-lang-classnotfoundexception-com-datastax-spark-connector-rdd-partitioner-c