问题
I am following along with answer to this post and the documentation in order to perform a dynamic windowed write on my data at the end of a pipeline. Here is what I have so far:
static void applyWindowedWrite(PCollection<String> stream) {
stream.apply(
FileIO.<String, String>writeDynamic()
.by(Event::getKey)
.via(TextIO.sink())
.to("gs://some_bucket/events/")
.withNaming(key -> defaultNaming(key, ".json")));
}
But NetBeans warns me about a syntax error on the last line:
FileNaming is not public in Write; cannot be accessed outside package
How do I make defaultNaming available to my pipeline so that I can use it for dynamic writes. Or, if that isn't possible, what should I be doing instead?
回答1:
Posting what I figured out in case someone else comes across this.
There were three issues with how I was attempting to use writeDynamic() before.
- Previously I had been using Beam version 2.3.0, which does indeed describe
FileNamingas a class internal toFileIO.Write. Beam 2.4.0 definesFileNamingas apublic static interfacemaking it available externally. - Fully resolving/importing
defaultNaming. Rather than callingdefaultNamingdirectly - as it is called in the example documentation - it must be invoked asFileIO.Write.defaultNamingsinceFileIOis the package I actually imported. - Adding
withDestinationCoderwas also required to perform the dynamic write.
The final solution ended up looking like this.
static void applyWindowedWrite(PCollection<String> stream) {
stream.apply(FileIO.<String, String>writeDynamic()
.by(Event::getKey)
.via(TextIO.sink())
.to("gs://some_bucket/events/")
.withDestinationCoder(StringUtf8Coder.of())
.withNumShards(1)
.withNaming(key -> FileIO.Write.defaultNaming(key, ".json")));
}
Where Event::getKey is a static function defined within the same package with the signature public static String getKey(String event).
This performs a windowed write which will write one file per window (as defined by the .withNumShards(1) method). This assumes the window has been defined in a previous step. A GroupByKey is not required prior to writing since it is done in the process of writing whenever the number of shards is defined explicitly. See the FileIO documentation for more details under "Writing files -> How many shards are generated per pane".
来源:https://stackoverflow.com/questions/50223935/using-defaultnaming-for-dynamic-windowed-writes-in-apache-beam