spark-cassandra-connector

scala.ScalaReflectionException: <none> is not a term

阅读更多关于 scala.ScalaReflectionException: is not a term

问题 I have the following piece of code in Spark: rdd .map(processFunction(_)) .saveToCassandra("keyspace", "tableName") Where def processFunction(src: String): Seq[Any] = src match { case "a" => List(A("a", 123112, "b"), A("b", 142342, "c")) case "b" => List(B("d", 12312, "e", "f"), B("g", 12312, "h", "i")) } Where: case class A(entity: String, time: Long, value: String) case class B(entity: String, time: Long, value1: String, value2: String) saveToCassandra expects a collection of objects and

Unable to serialize SparkContext in foreachRDD

阅读更多关于 Unable to serialize SparkContext in foreachRDD

问题 I am trying to save the streaming data to cassandra from Kafka. I am able to read and parse the data but when I call below lines to save the data i am getting a Task not Serializable Exception. My class is extending serializable but not sure why i am seeing this error, didn't get much help ever after googling for 3 hours, can some body give any pointers ? val collection = sc.parallelize(Seq((obj.id, obj.data))) collection.saveToCassandra("testKS", "testTable ", SomeColumns("id", "data"))`

error when use filter(),map(),… in spark java api( org.apache.spark.SparkException )

阅读更多关于 error when use filter(),map(),… in spark java api( org.apache.spark.SparkException )

问题 i'm new in spark and when i use filter of spark in java api, i get this error(if collect() all of table it's correctly worked and i can see all of data get from cassandra.) i checked master and workers version are same and when application start in web ui of spark i can see it but: [Stage 0:> (0 + 0) / 6] [Stage 0:> (0 + 2) / 6] [Stage 0:> (0 + 4) / 6] 2017-08-28 16:37:16,239 ERROR TaskSetManager:70 - Task 1 in stage 0.0 failed 4 times; aborting job 2017-08-28 16:37:21,351 ERROR

How to save spark streaming data in cassandra

阅读更多关于 How to save spark streaming data in cassandra

问题 build.sbt Below are the contents included in build.sbt file val sparkVersion = "1.6.3" scalaVersion := "2.10.5" resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-streaming" % sparkVersion, "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion) libraryDependencies +="datastax" % "spark-cassandra-connector" % "1.6.3-s_2.10" libraryDependencies +="org.apache.spark" %% "spark-sql" % "1.1.0" Command

Spark 1.5.1 + Scala 2.10 + Kafka + Cassandra = Java.lang.NoSuchMethodError:

阅读更多关于 Spark 1.5.1 + Scala 2.10 + Kafka + Cassandra = Java.lang.NoSuchMethodError:

I want to connect Kafka + Cassandra to the Spark 1.5.1. The versions of the libraries: scalaVersion := "2.10.6" libraryDependencies ++= Seq( "org.apache.spark" % "spark-streaming_2.10" % "1.5.1", "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.1", "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.5.0-M2" ) The initialization and use into app: val sparkConf = new SparkConf(true) .setMaster("local[2]") .setAppName("KafkaStreamToCassandraApp") .set("spark.executor.memory", "1g") .set("spark.cores.max", "1") .set("spark.cassandra.connection.host", "127.0.0.1") Creates schema

SSL between Kafka and Spark

阅读更多关于 SSL between Kafka and Spark

问题 We are using Kafka,Spark Streaming and loading data to Cassandra Need to implement a security layer between nodes running kafka and nodes running spark. Any guidance on how to implement SSL between kafka and spark nodes ? Thanks Sreeni 来源： https://stackoverflow.com/questions/37743490/ssl-between-kafka-and-spark

Spark is executing every single action two times

阅读更多关于 Spark is executing every single action two times

问题 I have created a simple Java application that uses Apache Spark to retrieve data from Cassandra, do some transformation on it and save it in another Cassandra table. I am using Apache Spark 1.4.1 configured in a standalone cluster mode with a single master and slave, located on my machine. DataFrame customers = sqlContext.cassandraSql("SELECT email, first_name, last_name FROM customer " + "WHERE CAST(store_id as string) = '" + storeId + "'"); DataFrame customersWhoOrderedTheProduct =

Cassandra Reading Benchmark with Spark

阅读更多关于 Cassandra Reading Benchmark with Spark

问题 I'm doing a benchmark on Cassandra's Reading performance. In the test-setup step I created a cluster with 1 / 2 / 4 ec2-instances and data nodes. I wrote 1 table with 100 million of entries (~3 GB csv-file). Then I launch a Spark application which reads the data into a RDD using the spark-cassandra-connector. However, I thought the behavior should be the following: The more instances Cassandra (same instance amount on Spark) uses, the faster the reads! With the writes everything seems to be

How to save spark streaming data in cassandra

阅读更多关于 How to save spark streaming data in cassandra

build.sbt Below are the contents included in build.sbt file val sparkVersion = "1.6.3" scalaVersion := "2.10.5" resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-streaming" % sparkVersion, "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion) libraryDependencies +="datastax" % "spark-cassandra-connector" % "1.6.3-s_2.10" libraryDependencies +="org.apache.spark" %% "spark-sql" % "1.1.0" Command to initialize shell: The below command is the shell initialization procedure I followed /usr/hdp/2.6.0

Saving data back into Cassandra as RDD

阅读更多关于 Saving data back into Cassandra as RDD

I am trying to read messages from Kafka, process the data, and then add the data into cassandra as if it is an RDD. My trouble is saving the data back into cassandra. from __future__ import print_function from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils from pyspark import SparkConf, SparkContext appName = 'Kafka_Cassandra_Test' kafkaBrokers = '1.2.3.4:9092' topic = 'test' cassandraHosts = '1,2,3' sparkMaster = 'spark://mysparkmaster:7077' if __name__ == "__main__": conf = SparkConf() conf.set('spark.cassandra.connection.host', cassandraHosts) sc =