How to save spark streaming data in cassandra

五迷三道 提交于 2019-12-06 10:05:06

java.lang.NoClassDefFoundError: Could not initialize class com.datastax.spark.connector.cql.CassandraConnector

Means the classpath has not been correctly setup for your application. Make sure you are using the --packages option when launching your application as is noted in the SCC Docs

For your other issues

You don't need awaitTermination in the REPL because the repl will not instantly quit after starting the streaming context. That call is there for an application which may have no further instructions to prevent the main thread from exiting.

Start will start the streaming immediately.

A line or two lines of code which related to contexts is causing the issue here!

I found the solution when i walked through the topics of context!

Here I was running multiple contexts but they are independent to each other.

I have initialized shell with below command:

/usr/hdp/2.6.0.3-8/spark/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.3-s_2.10 --conf spark.cassandra.connection.host=127.0.0.1 –jars spark-streaming-kafka-assembly_2.10-1.6.3.jar

So when shell starts A spark context with properties of Datastax connector are initialized.

Later I created some configurations and using those configurations created a spark streaming context. Using this context I have created kafkaStream. This kafkaStream is having only properties of SSC but not SC, so here raised the issue of storing in to cassandra.

I have tried to resolve it in the below and succeeded!


val sc = new SparkContext(new SparkConf().setAppName("Spark-Kafka-Streaming").setMaster("local[*]").set("spark.cassandra.connection.host", "127.0.0.1"))
val ssc = new StreamingContext(sc, Seconds(10))

Thanks everyone who came forward to support! Let me know if any more best possible ways to achieve it!

A very simple approach is to convert a stream as a dataframe for foreachRDD API, convert the RDD to DataFrame and save to cassandra using SparkSQL-Cassandra Datasource API. Below is a simple code snippet where I am saving the Twitter tweets to a Cassandra Table

stream.foreachRDD(rdd => {
  if (rdd.count() > 0) {
    val data = rdd.filter(status => status.getLang.equals("en")).map(status => TweetsClass(status.getId,
      status.getCreatedAt.toGMTString(),
      status.getUser.getLocation,
      status.getText)).toDF()
    //Save the data to Cassandra
    data.write.
      format("org.apache.spark.sql.cassandra").
      options(Map(
        "table" -> "sentiment_tweets",
        "keyspace" -> "My Keyspace",
        "cluster" -> "My Cluster")).mode(SaveMode.Append).save()

  }
})
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!