问题
I have a dataflow template created with below command
python scrap.py --setup_file /home/deepak_verma/setup.py
--temp_location gs://visualization-dev/temp
--staging_location gs://visualization-dev/stage
--project visualization-dev --job_name scrap-job
--subnetwork regions/us-east1/subnetworks/dataflow-internal
--region us-east1 --input sentiment_analysis.table_view
--output gs://visualization-dev/incoming
--runner DataflowRunner
--template_location gs://visualization-dev/template/scrap
My dataflow pipeline accepts the input and output parameters as value provider like this
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument(
'--input', dest='input', required=True,
help='Input view. sentiment_analysis.table_view',
)
parser.add_value_provider_argument(
'--output', dest='output', required=True,
help='output gcs file path'
)
and I am using this as
beam.io.Read(beam.io.BigQuerySource(query=read_query.format(
table=options.input.get(), limit=(LIMIT and "limit " + str(LIMIT) or '')), use_standard_sql=True)))
where read_query is defined as `SELECT upc, max_review_date FROM `{table}`
Now when I call this template using this with different input parameter
template_body = {
'jobName': job_name,
'parameters': {'input': 'table_view2'}
}
credentials = GoogleCredentials.get_application_default()
service = build('dataflow', 'v1b3', credentials=credentials)
request = service.projects().locations().templates().launch(projectId=constants.GCP_PROJECT_ID, location=constants.REGION, gcsPath=template_gcs_path, body=template_body)
The dataflow does not calls this for table_view2 but instead it use the table_view for this job.
回答1:
The problem is that you are already passing an input value when staging the template and that's the one being resolved. Remove --input sentiment_analysis.table_view when running the first command and leave it empty. Specify it only as a parameter when executing the template with 'parameters': {'input': 'sentiment_analysis.table_view2'}.
If you still need a default value you can do it when adding the value provider argument like in this example:
parser.add_value_provider_argument(
'--input', dest='input', required=True,
help='Input view. sentiment_analysis.table_view',
default='sentiment_analysis.table_view'
)
回答2:
What you need is to be able to pass the query as a ValueProvider, and not as an already-formatted string. This is not yet possible in Beam.
There's an open feature request here: https://issues.apache.org/jira/browse/BEAM-1440
来源:https://stackoverflow.com/questions/54559079/dataflow-template-job-is-not-taking-input-parameters