How to convert streaming Dataset to DStream?

微笑、不失礼 提交于 2019-12-07 12:03:22

问题


Is it possible to convert a streaming o.a.s.sql.Dataset to DStream? If so, how?

I know how to convert it to RDD, but it is in a streaming context.


回答1:


It is not possible. Structured Streaming and legacy Spark Streaming (DStreams) use completely different semantics and are not compatible with each other so:

  • DStream cannot be converted to Streaming Dataset.
  • Streaming Dataset cannot be converted to DStream.



回答2:


It could be possible (in some use cases).

That question really begs another:

Why would anyone want to do that conversion? What's the problem to be solved?

I can only imagine that such type conversion would only be required when mixing two different APIs in a single streaming application. I'd then say it does not make much sense as you'd rather not do this and make the conversion at Spark module level, i.e. migrate the streaming application from Spark Streaming to Spark Structured Streaming.

A streaming Dataset is an "abstraction" of a series of Datasets (I use quotes since the difference between streaming and batch Datasets is the isStreaming property of a Dataset).

It is possible to convert a DStream to a streaming Dataset so the latter behaves as the former (to keep the behaviour of the DStream and pretend to be a streaming Dataset).

Under the covers, the execution engines of Spark Streaming (DStream) and Spark Structured Streaming (streaming Dataset) are fairly similar. They both "generate" micro-batches of RDDs and Datasets, respectively. And RDDs are convertible to Datasets but this implicit conversion toDF or toDS.

So converting a DStream to a streaming Dataset would logically look as follows:

dstream.foreachRDD { rdd =>
  val df = rdd.toDF
  // this df is not streaming, but you don't really need that
}


来源:https://stackoverflow.com/questions/49559007/how-to-convert-streaming-dataset-to-dstream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!