How many RDDs does DStream generate for a batch interval?

前端 未结 3 1328
礼貌的吻别
礼貌的吻别 2021-02-19 20:39

Does one batch interval of data generate one and only one RDD in DStream regardless of how big is the quantity of the data?

相关标签:
3条回答
  • 2021-02-19 21:02

    In Spark Streaming Programming Guide - Discretized Streams (DStreams), there is:

    Each RDD in a DStream contains data from a certain interval

    0 讨论(0)
  • It's very late to reply to this thread. But still, It's worth adding a few more points. Number of RDDs depends upon how many receivers you have in your application. That's why "sparkContext.read" will have multiple RDDs. But if you have only one receiver or Kafka as a source (receiver-less) in that case you will get only one RDD.

    0 讨论(0)
  • 2021-02-19 21:19

    Yes, there is exactly one RDD per batch interval, produced at every batch interval independent of number of records (that are included in the RDD -- there could be zero records inside).

    If there wasn't, and RDD creation was conditioned on the number of elements, you wouldn't have synchronous (micro-batching) streaming, but rather a form of asynchronous processing.

    0 讨论(0)
提交回复
热议问题