Using defaultNaming for dynamic windowed writes in Apache Beam

安稳与你 提交于 2019-12-24 00:45:16

问题


I am following along with answer to this post and the documentation in order to perform a dynamic windowed write on my data at the end of a pipeline. Here is what I have so far:

static void applyWindowedWrite(PCollection<String> stream) {
    stream.apply(
        FileIO.<String, String>writeDynamic()
            .by(Event::getKey)
            .via(TextIO.sink())
            .to("gs://some_bucket/events/")
            .withNaming(key -> defaultNaming(key, ".json")));
}

But NetBeans warns me about a syntax error on the last line:

FileNaming is not public in Write; cannot be accessed outside package

How do I make defaultNaming available to my pipeline so that I can use it for dynamic writes. Or, if that isn't possible, what should I be doing instead?


回答1:


Posting what I figured out in case someone else comes across this.

There were three issues with how I was attempting to use writeDynamic() before.

  1. Previously I had been using Beam version 2.3.0, which does indeed describe FileNaming as a class internal to FileIO.Write. Beam 2.4.0 defines FileNaming as a public static interface making it available externally.
  2. Fully resolving/importing defaultNaming. Rather than calling defaultNaming directly - as it is called in the example documentation - it must be invoked as FileIO.Write.defaultNaming since FileIO is the package I actually imported.
  3. Adding withDestinationCoder was also required to perform the dynamic write.

The final solution ended up looking like this.

static void applyWindowedWrite(PCollection<String> stream) {
    stream.apply(FileIO.<String, String>writeDynamic()
                .by(Event::getKey)
                .via(TextIO.sink())
                .to("gs://some_bucket/events/")
                .withDestinationCoder(StringUtf8Coder.of())
                .withNumShards(1)
                .withNaming(key -> FileIO.Write.defaultNaming(key, ".json")));
}

Where Event::getKey is a static function defined within the same package with the signature public static String getKey(String event).

This performs a windowed write which will write one file per window (as defined by the .withNumShards(1) method). This assumes the window has been defined in a previous step. A GroupByKey is not required prior to writing since it is done in the process of writing whenever the number of shards is defined explicitly. See the FileIO documentation for more details under "Writing files -> How many shards are generated per pane".



来源:https://stackoverflow.com/questions/50223935/using-defaultnaming-for-dynamic-windowed-writes-in-apache-beam

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!