I use Spark 2.2.0-rc1.
I\'ve got a Kafka topic
which I\'m querying a running watermarked aggregation, with a 1 minute
watermark, giving out to
Here's my best guess:
Append mode only outputs the data after the watermark has passed (e.g. in this case 1 minute later). You didn't set a trigger (e.g. .trigger(Trigger.ProcessingTime("10 seconds")
) so by default it outputs batches as fast as possible. So for the first minute all your batches should be empty, and the first batch after a minute should contain some content.
Another possibility is that you're using groupBy("time")
instead of groupBy(window("time", "[window duration]"))
. I believe watermarks are meant to be used with time windows or mapGroupsWithState, so I'm not how the interaction works in this case.