spark-streaming

Spark Streaming not reading from Kafka topics

孤街醉人 提交于 2021-02-20 02:49:25
问题 I have set up Kafka and Spark on Ubuntu. I am trying to read kafka topics through Spark Streaming using pyspark(Jupyter notebook). Spark is neither reading the data nor throwing any error. Kafka producer and consumer are communicating with each other on terminal. Kafka is configured with 3 partitions on port 9092,9093,9094. Messages are getting stored in kafka topics. Now, I want to read it through Spark Streaming. I am not sure what I am missing. Even I have explored it on internet, but

Spark Streaming MQTT

心已入冬 提交于 2021-02-18 19:12:20
问题 I've been using spark to stream data from kafka and it's pretty easy. I thought using the MQTT utils would also be easy, but it is not for some reason. I'm trying to execute the following piece of code. val sparkConf = new SparkConf(true).setAppName("amqStream").setMaster("local") val ssc = new StreamingContext(sparkConf, Seconds(10)) val actorSystem = ActorSystem() implicit val kafkaProducerActor = actorSystem.actorOf(Props[KafkaProducerActor]) MQTTUtils.createStream(ssc, "tcp://localhost

Spark Streaming MQTT

懵懂的女人 提交于 2021-02-18 19:11:58
问题 I've been using spark to stream data from kafka and it's pretty easy. I thought using the MQTT utils would also be easy, but it is not for some reason. I'm trying to execute the following piece of code. val sparkConf = new SparkConf(true).setAppName("amqStream").setMaster("local") val ssc = new StreamingContext(sparkConf, Seconds(10)) val actorSystem = ActorSystem() implicit val kafkaProducerActor = actorSystem.actorOf(Props[KafkaProducerActor]) MQTTUtils.createStream(ssc, "tcp://localhost

Spark Streaming MQTT

假装没事ソ 提交于 2021-02-18 19:08:07
问题 I've been using spark to stream data from kafka and it's pretty easy. I thought using the MQTT utils would also be easy, but it is not for some reason. I'm trying to execute the following piece of code. val sparkConf = new SparkConf(true).setAppName("amqStream").setMaster("local") val ssc = new StreamingContext(sparkConf, Seconds(10)) val actorSystem = ActorSystem() implicit val kafkaProducerActor = actorSystem.actorOf(Props[KafkaProducerActor]) MQTTUtils.createStream(ssc, "tcp://localhost

Apache Spark Kinesis Integration: connected, but no records received

為{幸葍}努か 提交于 2021-02-16 08:30:23
问题 tldr; Can't use Kinesis Spark Streaming integration, because it receives no data. Testing stream is set up, nodejs app sends 1 simple record per second. Standard Spark 1.5.2 cluster is set up with master and worker nodes (4 cores) with docker-compose, AWS credentials in environment spark-streaming-kinesis-asl-assembly_2.10-1.5.2.jar is downloaded and added to classpath job.py or job.jar (just reads and prints) submitted. Everything seems to be okay, but no records what-so-ever are received.

Error when connecting spark structured streaming + kafka

别说谁变了你拦得住时间么 提交于 2021-02-11 15:45:49
问题 im trying to connect my structured streaming spark 2.4.5 with kafka, but all the times that im trying this Data Source Provider errors appears. Follow my scala code and my sbt build: import org.apache.spark.sql._ import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.streaming.Trigger object streaming_app_demo { def main(args: Array[String]): Unit = { println("Spark Structured Streaming with Kafka Demo Application Started ...") val KAFKA_TOPIC

Spark streaming - consuming message from socket and processing: Null Pointer Exception

大城市里の小女人 提交于 2021-02-11 14:21:48
问题 Need the message from the socket using spark streaming and read the file from filepath specified in the message and write to the dst. Message from socket : {"fileName" : "sampleFile.dat","filePath":"/Users/Desktop/test/abc1.dat","fileDst":"/Users/Desktop/git/spark-streaming-poc/src/main/resourcs/samplefile2"} Error: java.lang.NullPointerException at org.apache.spark.sql.execution.SparkPlan.sparkContext(SparkPlan.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.metrics

How to parse dynamic Json with dynamic keys inside it in Scala

一个人想着一个人 提交于 2021-02-11 12:56:58
问题 I am trying to parse Json structure which is dynamic in nature and load into database. But facing difficulty where json has dynamic keys inside it. Below is my sample json: Have tried using explode function but didn't help. moslty similar thing is described here How to parse a dynamic JSON key in a Nested JSON result? { "_id": { "planId": "5f34dab0c661d8337097afb9", "version": { "$numberLong": "1" }, "period": { "name" : "3Q20", "startDate": 20200629, "endDate": 20200927 }, "line": "b443e9c0

Spark Streaming: How Spark and Kafka communication happens?

萝らか妹 提交于 2021-02-11 07:46:14
问题 I would like to understand how the communication between the Kafka and Spark(Streaming) nodes takes place. I have the following questions. If Kafka servers and Spark nodes are in two separate clusters how would be communications takes place. What are the steps need to configure them. If both are in same clusters but are in different nodes, how will be communication happens. communication i mean here is whether it is a RPC or Socket communication. I would like to understand the internal

Spark Streaming: How Spark and Kafka communication happens?

痴心易碎 提交于 2021-02-11 07:46:09
问题 I would like to understand how the communication between the Kafka and Spark(Streaming) nodes takes place. I have the following questions. If Kafka servers and Spark nodes are in two separate clusters how would be communications takes place. What are the steps need to configure them. If both are in same clusters but are in different nodes, how will be communication happens. communication i mean here is whether it is a RPC or Socket communication. I would like to understand the internal