Spark not able to fetch events from Amazon Kinesis

喜你入骨 提交于 2019-12-08 02:47:42

问题


I have been trying to get Spark read events from Kinesis recently but am having problem in receiving the events. While Spark is able to connect to Kinesis and is able to get metadata from Kinesis, Its not able to get events from it. It always fetches zero elements back.

There are no errors, just empty results back. Spark is able to get metadata (Eg. number of shards in kinesis etc).

I have used these [1 & 2] guides for getting it working but have not got much luck yet. I have also tried couple of suggestions from SO [3]. The cluster has sufficient resources/cores available.

We have seen a version conflict in Protobuf Version between Spark and Kinesis which could also be a cause for this behavior. Spark uses protobuf-java version 2.5.0 and kinesis probably uses protobuf-java-2.6.1.jar.

Just wondered if anyone has come across this behavior or, has got spark working with kinesis.

Have tried with Spark 1.5.0, Spark 1.6.0.

  1. http://spark.apache.org/docs/latest/streaming-kinesis-integration.html
  2. https://github.com/apache/spark/blob/master/extras/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala

  3. Apache Spark Kinesis Sample not working


回答1:


Answering my own Question -

I have got some success with Spark Kinesis integration, and the key being the unionStreams.foreachRDD.

There are 2 versions of the foreachRDD available

  • unionStreams.foreachRDD
  • unionStreams.foreachRDD ((rdd: RDD[Array[Byte]], time: Time)

For some reason the first one is not able to get me the results but changing to the second one fetches me the results as expected. Yet to explore the reason.

Adding a code snippet below for reference.

Also consider changing this. This helped me as well-

"org.apache.spark" % "spark-streaming-kinesis-asl_2.10" % "1.6.0", // Doesnt work
"org.apache.spark" % "spark-streaming-kinesis-asl_2.10" % "1.4.1",  // Works

Hope it helps someone :)

Thanks everyone for help.

val kinesisStreams = (0 until numStreams).map {
  count =>
    val stream = KinesisUtils.createStream(
      ssc,
      consumerName,
      streamName,
      endpointUrl,
      regionName,
      InitialPositionInStream.TRIM_HORIZON,
      kinesisCheckpointInterval,
      StorageLevel.MEMORY_AND_DISK_2
    )

    stream
}
val unionStreams = ssc.union(kinesisStreams)

println(s"========================")
println(s"Num of streams: ${numStreams}")
println(s"========================")

/*unionStreams.foreachRDD{ // Doesn't Work !!
  rdd =>
    println(rdd.count)
    println("rdd isempty:" + rdd.isEmpty)
}*/ 
unionStreams.foreachRDD ((rdd: RDD[Array[Byte]], time: Time) => { // Works, Yeah !!
  println(rdd.count)
  println("rdd isempty:" + rdd.isEmpty)
  }
)

ssc.start()
ssc.awaitTermination()


来源:https://stackoverflow.com/questions/35567440/spark-not-able-to-fetch-events-from-amazon-kinesis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!