问题
We want to create a custom dataflow template using the function parameters add_value_provider_argument
unable to launch the following command without inputting the variables defined in add_value_provider_argument ()
class UserOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument(
'--input_topic',
help='The Cloud Pub/Sub topic to read from.\n'
'"projects/<PROJECT_NAME>/topics/<TOPIC_NAME>".'
)
parser.add_value_provider_argument(
'--window_size',
type=float,
default=1.0,
help='Output file\'s window size in number of minutes.'
)
parser.add_value_provider_argument(
'--output_path',
help='GCS Path of the output file including filename prefix.'
)
def run():
pipeline_options = PipelineOptions(streaming=True, save_main_session=True)
custom_options = pipeline_options.view_as(UserOptions)
with beam.Pipeline(options=custom_options)as pipeline:
print ("cecei est un test", custom_options.input_topic)
(pipeline
| 'Read PubSub Messages' >> beam.io.ReadFromPubSub(topic=custom_options.input_topic.get())
| 'Window into' >> GroupWindowsIntoBatches(custom_options.window_size.get())
| 'Write to GCS' >> beam.ParDo(WriteBatchesToGCS(custom_options.output_path.get()))
)
if __name__ == '__main__':
run()
I execute this file with
python luckycart_check.py \
--runner DataflowRunner \
--project $PROJECT_NAME \
--staging_location gs://$BUCKET_NAME/staging \
--temp_location gs://$BUCKET_NAME/temp \
--template_location gs://$BUCKET_NAME/templates/luckycartTEMPLATE \
and I get the following error:
File "/home/jupyter/env/local/lib/python2.7/site-packages/apache_beam/options/value_provider.py", line 106, in get
'%s.get() not called from a runtime context' % self)
apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input_topic, type: str, default_value: None).get() not called from a runtime context
(env) jupyter@luckykart:~/clement/terraform/basics$
回答1:
If you don't specify --input_topic
when creating the pipeline, it will be of type RuntimeValueProvider
, meaning you can only get()
its value when the Dataflow job is running. This is normal.
Some transforms like WriteToBigQuery
accept ValueProvider
arguments (without the .get()
). However, ReadFromPubSub
does not currently accept ValueProvider
arguments since it is implemented as a native transform in Dataflow.
See this documentation for more on creating templates with ValueProviders: https://cloud.google.com/dataflow/docs/guides/templates/creating-templates
来源:https://stackoverflow.com/questions/58838705/usage-problem-add-value-provider-argument-on-a-streaming-stream-apache-beam-p