How to write to a file name defined at runtime?

三世轮回 提交于 2019-12-02 02:30:17

问题


I want to write to a gs file but I don’t know the file name at compile time. Its name is based on behavior that is defined at runtime. How can I proceed?


回答1:


If you're using Beam Java, you can use FileIO.writeDynamic() for this (starting with Beam 2.3 which is currently in the process of being released - but you can already use it via the version 2.3.0-SNAPSHOT), or the older DynamicDestinations API (available in Beam 2.2).

Example of using FileIO.writeDynamic() to write a PCollection of bank transactions to different paths on GCS depending on the transaction's type:

PCollection<BankTransaction> transactions = ...;
transactions.apply(
    FileIO.<BankTransaction, TransactionType>writeDynamic()
      .by(Transaction::getType)
      .via(BankTransaction::toString, TextIO.sink())
      .to("gs://bucket/myfolder/")
      .withNaming(type -> defaultNaming("transactions_", ".txt"));

For an example of DynamicDestinations use, see example code in the TextIO unit tests.

Alternatively, if you want to write each record to its own file, just use the FileSystems API (in particular, FileSystems.create()) from a DoFn.




回答2:


For the Python crowd:

An experimental write was added to the Beam python SDK in 2.14.0, beam.io.fileio.WriteToFiles:

my_pcollection | beam.io.fileio.WriteToFiles(
      path='/my/file/path',
      destination=lambda record: 'avro' if record['type'] == 'A' else 'csv',
      sink=lambda dest: AvroSink() if dest == 'avro' else CsvSink(),
      file_naming=beam.io.fileio.destination_prefix_naming())

which can be used to write to different files per-record.

If your filename is based on data within your pcollections, you can use the destination and file_naming to create files based on each record's data.

More documentation here:

https://beam.apache.org/releases/pydoc/2.14.0/apache_beam.io.fileio.html#dynamic-destinations

And the JIRA issue here:

https://issues.apache.org/jira/browse/BEAM-2857



来源:https://stackoverflow.com/questions/48519834/how-to-write-to-a-file-name-defined-at-runtime

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!