I use Spark 1.6.0 with Cloudera 5.8.3.
I have a DStream object and plenty of transformations defined on top of it,
val stream = KafkaUtils.c
Start a stream with name myStreamName and wait for it to start up -
deltaStreamingQuery = (streamingDF
.writeStream
.format("delta")
.queryName(myStreamName)
.start(writePath)
)
untilStreamIsReady(myStreamName)
PySpark version wait for the stream to start up:
def getActiveStreams():
try:
return spark.streams.active
except:
print("Unable to iterate over all active streams - using an empty set instead.")
return []
def untilStreamIsReady(name, progressions=3):
import time
queries = list(filter(lambda query: query.name == name, getActiveStreams()))
while (len(queries) == 0 or len(queries[0].recentProgress) < progressions):
time.sleep(5) # Give it a couple of seconds
queries = list(filter(lambda query: query.name == name, getActiveStreams()))
print("The stream {} is active and ready.".format(name))
Spark Scala version wait for the stream to start up:
def getActiveStreams():Seq[org.apache.spark.sql.streaming.StreamingQuery] = {
return try {
spark.streams.active
} catch {
case e:Throwable => {
// In extream cases, this funtion may throw an ignorable error.
println("Unable to iterate over all active streams - using an empty set instead.")
Seq[org.apache.spark.sql.streaming.StreamingQuery]()
}
}
}
def untilStreamIsReady(name:String, progressions:Int = 3):Unit = {
var queries = getActiveStreams().filter(_.name == name)
while (queries.length == 0 || queries(0).recentProgress.length < progressions) {
Thread.sleep(5*1000) // Give it a couple of seconds
queries = getActiveStreams().filter(_.name == name)
}
println("The stream %s is active and ready.".format(name))
}
To the original question.. add another version of this function - wait for the stream first to start up and then wait another time (just add a negative condition on the wait state) for it to finish, so the complete version would look something like this -
untilStreamIsReady(myStreamName)
untilStreamIsDone(myStreamName) // reverse of untilStreamIsReady - wait when myStreamName will not be in the list