Usage problem add_value_provider_argument on a streaming stream ( Apache beam /PYTHON)

一曲冷凌霜 提交于 2020-06-17 02:28:47

问题


We want to create a custom dataflow template using the function parameters add_value_provider_argument

unable to launch the following command without inputting the variables defined in add_value_provider_argument ()

class UserOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):     
        parser.add_value_provider_argument(
            '--input_topic',
            help='The Cloud Pub/Sub topic to read from.\n'
                 '"projects/<PROJECT_NAME>/topics/<TOPIC_NAME>".'
        )
        parser.add_value_provider_argument(
            '--window_size',
            type=float,
            default=1.0,
            help='Output file\'s window size in number of minutes.'
        )
        parser.add_value_provider_argument(
            '--output_path',
            help='GCS Path of the output file including filename prefix.'
        )

def run():
    pipeline_options = PipelineOptions(streaming=True, save_main_session=True)
    custom_options = pipeline_options.view_as(UserOptions)

    with beam.Pipeline(options=custom_options)as pipeline:
        print ("cecei est un test", custom_options.input_topic)
        (pipeline 
         | 'Read PubSub Messages' >> beam.io.ReadFromPubSub(topic=custom_options.input_topic.get())
         | 'Window into' >> GroupWindowsIntoBatches(custom_options.window_size.get())
         | 'Write to GCS' >> beam.ParDo(WriteBatchesToGCS(custom_options.output_path.get()))

        )               

if __name__ == '__main__':
    run()

I execute this file with

python luckycart_check.py \
    --runner DataflowRunner \
    --project $PROJECT_NAME \
    --staging_location gs://$BUCKET_NAME/staging \
    --temp_location gs://$BUCKET_NAME/temp \
    --template_location gs://$BUCKET_NAME/templates/luckycartTEMPLATE \

and I get the following error:

 File "/home/jupyter/env/local/lib/python2.7/site-packages/apache_beam/options/value_provider.py", line 106, in get
    '%s.get() not called from a runtime context' % self)
apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input_topic, type: str, default_value: None).get() not called from a runtime context
(env) jupyter@luckykart:~/clement/terraform/basics$ 

回答1:


If you don't specify --input_topic when creating the pipeline, it will be of type RuntimeValueProvider, meaning you can only get() its value when the Dataflow job is running. This is normal.

Some transforms like WriteToBigQuery accept ValueProvider arguments (without the .get()). However, ReadFromPubSub does not currently accept ValueProvider arguments since it is implemented as a native transform in Dataflow.

See this documentation for more on creating templates with ValueProviders: https://cloud.google.com/dataflow/docs/guides/templates/creating-templates



来源:https://stackoverflow.com/questions/58838705/usage-problem-add-value-provider-argument-on-a-streaming-stream-apache-beam-p

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!