Write streaming data to GCS using Apache Beam

纵然是瞬间 提交于 2019-12-24 16:34:08

问题


How to write messages received from PubSub to a text file in GCS using TextIO in Apache Beam? Saw some methods like withWindowedWrites() and withFilenamePolicy() but couldn't find any example of it in the documentation.


回答1:


Here is an example provided you are using the Java SDK (BEAM 2.1.0).

PipelineOptions options = PipelineOptionsFactory.fromArgs(args)
                                                    .withValidation()
                                                    .as(PipelineOptions.class);

Pipeline pipeline = Pipeline.create(options);

pipeline.begin()
               .apply("PubsubIO",PubsubIO.readStrings()
                     .withTimestampAttribute("timestamp")
                     .fromSubscription("projects/YOUR-PROJECT/subscriptions/YOUR-SUBSCRIPTION"))
               .apply(Window.<String>into(FixedWindows.of(Duration.standardSeconds(30L))))
               .apply(TextIO.write().to("gs://YOUR-BUCKET").withWindowedWrites());

You can see the defaults that the SDK uses for the file naming by exploring the "expand" method in TextIO.Write.expand(PCollection input). Specifically I'd take a look at DefaultFilenamePolicy.java



来源:https://stackoverflow.com/questions/46909579/write-streaming-data-to-gcs-using-apache-beam

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!