Using TextIO.Write with a complicated PCollection type in Google Cloud Dataflow

南楼画角 提交于 2020-01-14 04:04:51

问题


I have a PCollection that looks like this:

PCollection<KV<KV<String, EventSession>, Long>> windowed_counts

My goal is to write this out as a text file. I thought to use something like:

windowed_counts.apply( TextIO.Write.to( "output" ));

but am having a hard time getting the Coders setup correctly. This is what I thought would work:

    KvCoder kvcoder = KvCoder.of(KvCoder.of(StringUtf8Coder.of(), AvroDeterministicCoder.of(EventSession.class) ), TextualLongCoder.of());
    TextIO.Write.Bound io = TextIO.Write.withCoder( kvcoder );
    windowed_counts.apply( io.to( "output" ));

where TextualLongCoder is my own subclass of AtomicCoder, analogous to TextualIntegerCoder. The EventSession class is annotated to use AvroDeterministicCoder as it's default coder.

But with this I get garbled output that includes non-textual character, etc. Can anybody advice on how you would write this particular PCollection out as text? I'm sure there's something obvious I'm missing here...


回答1:


Did you try creating a transform that will convert a PCollection of KV<KV<String, EventSession>, Long> to a PCollection of Strings and then writing it into a text file?

I found it to be most flexible way for my needs



来源:https://stackoverflow.com/questions/29131859/using-textio-write-with-a-complicated-pcollection-type-in-google-cloud-dataflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!