How can I get GroupByKey to trigger early results, rather than wait for all the data to arrive (which in my case is a pretty long time).I tried to split my input PCollection into windows with an early trigger, but it just doesn`t work. It still waits for all the data to arrive before giving out the results.
PCollection<List<String>> input = ...
PCollection<KV<Integer,List<String>>> keyedInput = input.apply(ParDo.of(new AddArbitraryKey()))
keyedInput.apply(Window<KV<Integer,List<String>>>into(
FixedWindows.of(Duration.standardSeconds(1)))
.triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow()))
.withAllowedLateness(Duration.ZERO).discardingFiredPanes())
.apply(GroupByKey.<Integer,List<String>>create())
.apply(ParDo.of(new RemoveArbitraryKey()))
.apply(ParDo.of(new FurtherProcessing())
I am doing this to prevent fusing . The AddArbitraryKey transform outputs its elements with Timestamp. However, GroupByKey holds up everything until all the data arrives (for all the windows) . Could someone please tell me how i can get it to trigger early. Thank You .
You can install a trigger like
Repeatedly
.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDuration(Duration.standardMinutes(1))
.orFinally(AfterWatermark.pastEndOfWindow())
.discardingFiredPanes()
Or
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDuration(Duration.standardMinutes(1))
To prevent fusion, it's better to use the transform Reshuffle.viaRandomKey() which performs better and makes sure to not introduce any additional triggering delays.
来源:https://stackoverflow.com/questions/48886943/early-results-from-groupbykey-transform