Notifying Google PubSub when Dataflow job is complete

微笑、不失礼 提交于 2019-12-11 04:48:36

问题


Is there a way to publish a message onto Google Pubsub after a Google Dataflow job completes? We have a need to notify dependent systems that the processing of incoming data is complete. How could Dataflow publish after writing data to the sink?

EDIT: We want to notify after a pipeline completes writing to GCS. Our pipeline looks like this:

 
Pipeline.create(options)
                .apply(....)
                .apply(AvroIO.Write.named("Write to GCS")
                             .withSchema(Extract.class)
                             .to(options.getOutputPath())
                             .withSuffix(".avro"));
p.run();

If we add logic outside of the pipeline.apply(...) methods we are notified when the code completes execution, not when the pipeline is completed. Ideally we could add another .apply(...) after the AvroIO sink and publish a message to PubSub.


回答1:


You have two options to get notified when your pipeline finishes, and then subsequently publish a message - or do whatever you want to after the pipeline finishes running:

  1. Use the BlockingPipelineRunner. This will run your pipeline synchronously.
  2. Use the DataflowPipelineRunner. This will run your pipeline asynchronously. You can then poll the pipeline for its status, and wait for it to finish.


来源:https://stackoverflow.com/questions/38526424/notifying-google-pubsub-when-dataflow-job-is-complete

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!