Custom Apache Beam Python version in Dataflow

☆樱花仙子☆ 提交于 2019-12-05 12:48:40

I will answer myself as I got the answer of this question at one Apache Beam's JIRA I have been helping with.

If you want to use a custom Apache Beam Python version in Google Cloud Dataflow (that is, run your pipeline with the --runner DataflowRunner, you must use the option --sdk_location <apache_beam_v1.2.3.tar.gz> when you run your pipeline; where <apache_beam_v1.2.3.tar.gz> is the location of the corresponding packaged version that you want to use.

For example, as of this writing, if you have checked out the HEAD version of the Apache Beam's git repository, you have to first package the repository by navigating to the Python SDK with cd beam/sdks/python and then run python setup.py sdist (a compressed tar file will be created in the distsubdirectory).

Thereafter you can run your pipeline like this:

python your_pipeline.py [...your_options...] --sdk_location beam/sdks/python/dist/apache-beam-2.2.0.dev0.tar.gz

Google Cloud Dataflow will use the supplied SDK.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!