问题
In a streaming beam pipeline, a trigger is set to be
Window.into(FixedWindows.of(Duration.standardHours(1)))
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(15))))
.withAllowedLateness(Duration.standardHours(1))
.accumulatingFiredPanes())
If there's no new data between the early firing (15 minutes after the first element of the current window) and the watermark, will there be another firing at the end of the watermark?
If yes, under the same scenario, will there be another firing at the end of the watermark if
accumulatingFiredPanesis changed todiscardingFiredPanes?
回答1:
Yes. There should always be a firing when the watermark passes the end of the window. The early firing panes will be marked as early, and the watermark pane will be marked as on time.
Yes, currently we always guarantee an on_time pane, which means there will be a firing at the end of the watermark.
回答2:
For #2, you can set the Window.ClosingBehavior as second parameter to withAllowedLateness. There are two variants:
- FIRE_ALWAYS
- FIRE_IF_NON_EMPTY
See https://beam.apache.org/releases/javadoc/2.6.0/org/apache/beam/sdk/transforms/windowing/Window.ClosingBehavior.html
来源:https://stackoverflow.com/questions/47760273/apache-beam-number-of-times-a-pane-is-fired-with-early-triggers