Iterative processing in Dataflow

故事扮演 提交于 2020-01-25 04:16:21

问题


As shown here Dataflow pipelines are represented by a fixed DAG. I'm wondering if it's possible to implement a pipeline where the processing proceeds until a dynamically evaluated condition is satisfied based on the data computed so far.

Here's some pseudo code to illustrate what I'd like to implement:

    PCollection pco = null
    while(true):
        pco = pco.apply(someTransform())
        if (conditionSatisfied(pco)):
            break
    pco.Write()

回答1:


It seems like you really want iterative computations. Right now Dataflow does not provide support for that, but we are aware that it is a very important use case and we are working on finding the right set of APIs to express it.

For now your workarounds are:

  • Iteratively run whole pipelines (run pipeline, inspect output, run again if the condition is not satisfied, etc). This has the obvious downside of pipeline setup and teardown overhead.
  • Build a pipeline with a hard-coded number of iterations by .apply()'ing in a loop unconditionally, then run the whole pipeline.
  • A combination of the two, e.g. run fixed 5-iteration pipelines until you're satisfied with the result.


来源:https://stackoverflow.com/questions/32236826/iterative-processing-in-dataflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!