spark-cassandra-connector

scala.ScalaReflectionException: <none> is not a term

南楼画角 提交于 2019-12-10 02:04:08
问题 I have the following piece of code in Spark: rdd .map(processFunction(_)) .saveToCassandra("keyspace", "tableName") Where def processFunction(src: String): Seq[Any] = src match { case "a" => List(A("a", 123112, "b"), A("b", 142342, "c")) case "b" => List(B("d", 12312, "e", "f"), B("g", 12312, "h", "i")) } Where: case class A(entity: String, time: Long, value: String) case class B(entity: String, time: Long, value1: String, value2: String) saveToCassandra expects a collection of objects and

Unable to serialize SparkContext in foreachRDD

吃可爱长大的小学妹 提交于 2019-12-09 23:00:53
问题 I am trying to save the streaming data to cassandra from Kafka. I am able to read and parse the data but when I call below lines to save the data i am getting a Task not Serializable Exception. My class is extending serializable but not sure why i am seeing this error, didn't get much help ever after googling for 3 hours, can some body give any pointers ? val collection = sc.parallelize(Seq((obj.id, obj.data))) collection.saveToCassandra("testKS", "testTable ", SomeColumns("id", "data"))`

error when use filter(),map(),… in spark java api( org.apache.spark.SparkException )

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-08 06:20:15
问题 i'm new in spark and when i use filter of spark in java api, i get this error(if collect() all of table it's correctly worked and i can see all of data get from cassandra.) i checked master and workers version are same and when application start in web ui of spark i can see it but: [Stage 0:> (0 + 0) / 6] [Stage 0:> (0 + 2) / 6] [Stage 0:> (0 + 4) / 6] 2017-08-28 16:37:16,239 ERROR TaskSetManager:70 - Task 1 in stage 0.0 failed 4 times; aborting job 2017-08-28 16:37:21,351 ERROR

How to save spark streaming data in cassandra

北慕城南 提交于 2019-12-07 22:03:56
问题 build.sbt Below are the contents included in build.sbt file val sparkVersion = "1.6.3" scalaVersion := "2.10.5" resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-streaming" % sparkVersion, "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion) libraryDependencies +="datastax" % "spark-cassandra-connector" % "1.6.3-s_2.10" libraryDependencies +="org.apache.spark" %% "spark-sql" % "1.1.0" Command

Spark 1.5.1 + Scala 2.10 + Kafka + Cassandra = Java.lang.NoSuchMethodError:

早过忘川 提交于 2019-12-07 12:52:29
I want to connect Kafka + Cassandra to the Spark 1.5.1. The versions of the libraries: scalaVersion := "2.10.6" libraryDependencies ++= Seq( "org.apache.spark" % "spark-streaming_2.10" % "1.5.1", "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.1", "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.5.0-M2" ) The initialization and use into app: val sparkConf = new SparkConf(true) .setMaster("local[2]") .setAppName("KafkaStreamToCassandraApp") .set("spark.executor.memory", "1g") .set("spark.cores.max", "1") .set("spark.cassandra.connection.host", "127.0.0.1") Creates schema

SSL between Kafka and Spark

拥有回忆 提交于 2019-12-07 12:03:19
问题 We are using Kafka,Spark Streaming and loading data to Cassandra Need to implement a security layer between nodes running kafka and nodes running spark. Any guidance on how to implement SSL between kafka and spark nodes ? Thanks Sreeni 来源: https://stackoverflow.com/questions/37743490/ssl-between-kafka-and-spark

Spark is executing every single action two times

风流意气都作罢 提交于 2019-12-07 09:42:45
问题 I have created a simple Java application that uses Apache Spark to retrieve data from Cassandra, do some transformation on it and save it in another Cassandra table. I am using Apache Spark 1.4.1 configured in a standalone cluster mode with a single master and slave, located on my machine. DataFrame customers = sqlContext.cassandraSql("SELECT email, first_name, last_name FROM customer " + "WHERE CAST(store_id as string) = '" + storeId + "'"); DataFrame customersWhoOrderedTheProduct =

Cassandra Reading Benchmark with Spark

梦想的初衷 提交于 2019-12-06 10:33:01
问题 I'm doing a benchmark on Cassandra's Reading performance. In the test-setup step I created a cluster with 1 / 2 / 4 ec2-instances and data nodes. I wrote 1 table with 100 million of entries (~3 GB csv-file). Then I launch a Spark application which reads the data into a RDD using the spark-cassandra-connector. However, I thought the behavior should be the following: The more instances Cassandra (same instance amount on Spark) uses, the faster the reads! With the writes everything seems to be

How to save spark streaming data in cassandra

五迷三道 提交于 2019-12-06 10:05:06
build.sbt Below are the contents included in build.sbt file val sparkVersion = "1.6.3" scalaVersion := "2.10.5" resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-streaming" % sparkVersion, "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion) libraryDependencies +="datastax" % "spark-cassandra-connector" % "1.6.3-s_2.10" libraryDependencies +="org.apache.spark" %% "spark-sql" % "1.1.0" Command to initialize shell: The below command is the shell initialization procedure I followed /usr/hdp/2.6.0

Saving data back into Cassandra as RDD

不打扰是莪最后的温柔 提交于 2019-12-06 07:14:19
I am trying to read messages from Kafka, process the data, and then add the data into cassandra as if it is an RDD. My trouble is saving the data back into cassandra. from __future__ import print_function from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils from pyspark import SparkConf, SparkContext appName = 'Kafka_Cassandra_Test' kafkaBrokers = '1.2.3.4:9092' topic = 'test' cassandraHosts = '1,2,3' sparkMaster = 'spark://mysparkmaster:7077' if __name__ == "__main__": conf = SparkConf() conf.set('spark.cassandra.connection.host', cassandraHosts) sc =