问题
I have a PCollection that looks like this:
PCollection<KV<KV<String, EventSession>, Long>> windowed_counts
My goal is to write this out as a text file. I thought to use something like:
windowed_counts.apply( TextIO.Write.to( "output" ));
but am having a hard time getting the Coders setup correctly. This is what I thought would work:
KvCoder kvcoder = KvCoder.of(KvCoder.of(StringUtf8Coder.of(), AvroDeterministicCoder.of(EventSession.class) ), TextualLongCoder.of());
TextIO.Write.Bound io = TextIO.Write.withCoder( kvcoder );
windowed_counts.apply( io.to( "output" ));
where TextualLongCoder is my own subclass of AtomicCoder, analogous to TextualIntegerCoder. The EventSession class is annotated to use AvroDeterministicCoder as it's default coder.
But with this I get garbled output that includes non-textual character, etc. Can anybody advice on how you would write this particular PCollection out as text? I'm sure there's something obvious I'm missing here...
回答1:
Did you try creating a transform that will convert a PCollection
of KV<KV<String, EventSession>, Long>
to a PCollection
of String
s and then writing it into a text file?
I found it to be most flexible way for my needs
来源:https://stackoverflow.com/questions/29131859/using-textio-write-with-a-complicated-pcollection-type-in-google-cloud-dataflow