Staging files on google dataflow worker

前端 未结 1 1993
轮回少年
轮回少年 2021-01-19 02:16

Is there anything in the Dataflow SDK that would allow me to stage resource files on a worker? I have specific static file resources that I need to make available on the f

相关标签:
1条回答
  • 2021-01-19 03:08

    You can specify --filesToStage to specify files that should be staged. There are several issues to be aware of:

    1. By default, the Dataflow SDK sets --filesToStage to all of the files in your classpath, which ensures that the code needed to run your pipeline is available to the worker. If you override this option you'll need to make sure that it includes your code.
    2. The files on the worker (which will be in the classpath) will have a MD5 hash appended to them. So if you specified --filesToStage=foo.zip, the file name would be foo-<someHash>.zip. You would need to iterate over all the files in the classpath to find the appropriate one.

    See the documentation on --filesToStage in https://cloud.google.com/dataflow/pipelines/executing-your-pipeline for some more info.

    0 讨论(0)
提交回复
热议问题