Remove duplicates across window triggers/firings

前端 未结 1 467
时光取名叫无心
时光取名叫无心 2020-12-20 01:36

Let\'s say I have an unbounded pcollection of sentences keyed by userid, and I want a constantly updated value for whether the user is annoying, we can calculate whether a u

相关标签:
1条回答
  • 2020-12-20 01:56

    Today there is no way to directly express "output only when the combined result has changed".

    One approach that you may be able to apply to reduce data volume, depending on your pipeline: Use .discardingFiredPanes() and then follow the GroupByKey with an immediate filter that drops any zero values, where "zero" means the identity element of your CombineFn. I'm using the fact that associativity requirements of Combine mean you must be able to independently calculate the incremental "annoying-ness" of a sentence without reference to the history.

    When BEAM-23 (cross-bundle mutable per-key-and-window state for ParDo) is implemented, you will be able to manually maintain the state and implement this sort of "only send output when the result changes" logic yourself.

    However, I think this scenario likely deserves explicit consideration in the model. It blends the concepts embodied today by triggers and the accumulation mode.

    0 讨论(0)
提交回复
热议问题