Singleton in Google Dataflow

瘦欲@ 提交于 2019-12-09 04:04:27
Pablo

Dataflow uses multiple machines in parallel to do data analysis, so your API will have to be initialized at least once per machine.

In fact, Dataflow does not have strong guarantees on the life of these machines, so they may come and go relatively frequently.

A simple way to have your job access an external service and avoid initializing the API too much is to initialize it in your DoFn:

class APICallingDoFn extends DoFn {
    private ExternalServiceHandle handle = null;

    @Setup
    public void initializeExternalAPI() {
      // ...
    }

    @ProcessElement
    public void processElement(ProcessContext c) {
        // ... process each element -- setup will have been called
    }
}

You need to do this because Beam nor Dataflow guarantee the duration of a DoFn instance, or a worker.

Hope this helps.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!