Early results from GroupByKey transform

戏子无情 提交于 2019-12-06 06:41:17

问题


How can I get GroupByKey to trigger early results, rather than wait for all the data to arrive (which in my case is a pretty long time).I tried to split my input PCollection into windows with an early trigger, but it just doesn`t work. It still waits for all the data to arrive before giving out the results.

PCollection<List<String>> input = ...
PCollection<KV<Integer,List<String>>> keyedInput = input.apply(ParDo.of(new AddArbitraryKey()))
keyedInput.apply(Window<KV<Integer,List<String>>>into(
          FixedWindows.of(Duration.standardSeconds(1)))
         .triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow()))
         .withAllowedLateness(Duration.ZERO).discardingFiredPanes())
 .apply(GroupByKey.<Integer,List<String>>create())
       .apply(ParDo.of(new RemoveArbitraryKey()))
       .apply(ParDo.of(new FurtherProcessing())

I am doing this to prevent fusing . The AddArbitraryKey transform outputs its elements with Timestamp. However, GroupByKey holds up everything until all the data arrives (for all the windows) . Could someone please tell me how i can get it to trigger early. Thank You .


回答1:


You can install a trigger like

Repeatedly
  .forever(AfterProcessingTime
    .pastFirstElementInPane()
    .plusDuration(Duration.standardMinutes(1))
  .orFinally(AfterWatermark.pastEndOfWindow())
  .discardingFiredPanes()

Or

AfterWatermark.pastEndOfWindow()
  .withEarlyFirings(
    AfterProcessingTime
      .pastFirstElementInPane()
      .plusDuration(Duration.standardMinutes(1))



回答2:


To prevent fusion, it's better to use the transform Reshuffle.viaRandomKey() which performs better and makes sure to not introduce any additional triggering delays.



来源:https://stackoverflow.com/questions/48886943/early-results-from-groupbykey-transform

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!