how to pass query parameter to sql file using bigquery operator

蹲街弑〆低调 提交于 2021-02-09 11:10:50

问题


I need access the parameter passed by BigqueryOperator in sql file, but I am getting error ERROR - queryParameters argument must have a type <class 'dict'> not <class 'list'> I am using below code:

t2 = bigquery_operator.BigQueryOperator(
task_id='bq_from_source_to_clean',
sql='prepare.sql',
use_legacy_sql=False,
allow_large_results=True,
query_params=[{ 'name': 'threshold_date', 'parameterType': { 'type': 'STRING' },'parameterValue': { 'value': '2020-01-01' } }],
destination_dataset_table="{}.{}.{}".format('xxxx',
                                            'xxxx',
                                            'temp_airflow_test'),
create_disposition="CREATE_IF_NEEDED",
write_disposition="WRITE_TRUNCATE",
dag=dag

)

Sql :

select  cast(DATE_ADD(a.dt_2, interval 7 day) as DATE) as dt_1
,a.dt_2
,cast('2010-01-01' as DATE) as dt_3 
from (select cast(@threshold_date as date) as dt_2) a

I am using Google composer version composer-1.7.0-airflow-1.10.2

Thanks in Advance.


回答1:


After diving into the source code, it appears that BigQueryHook had a bug fixed in Airflow 1.10.3.

The way you defined query_params is correct for newer versions of Airflow, and should be a list according to BigQuery API : see https://cloud.google.com/bigquery/docs/parameterized-queries#bigquery_query_params_named-python.

Anyway, you are getting this error because in Airflow 1.10.2, query_params is defined as a dict, see :

https://github.com/apache/airflow/blob/1.10.2/airflow/contrib/hooks/bigquery_hook.py#L678

query_param_list = [
    ...
    (query_params, 'queryParameters', None, dict),
    ...
]

This causes the internal _validate_value function to throw a TypeError :

https://github.com/apache/airflow/blob/1.10.2/airflow/contrib/hooks/bigquery_hook.py#L1954

def _validate_value(key, value, expected_type):
    """ function to check expected type and raise
    error if type is not correct """
    if not isinstance(value, expected_type):
        raise TypeError("{} argument must have a type {} not {}".format(
            key, expected_type, type(value)))

I did not find any example of query_params in Airflow 1.10.2 (or any unit tests...), but I think it's just because it is not usable.

These bugs has been fixed by these commits :

  • https://github.com/apache/airflow/commit/0c797a830e3370bd6e39f5fcfc128a8fd776912e#diff-ee06f8fcbc476ea65446a30160c2a2b2R784 : change dict to list
  • https://github.com/apache/airflow/pull/4876 : update documentation

These changes have been embedded in Airflow 1.10.3, but, as of now, Airflow 1.10.3 is not available in Composer (https://cloud.google.com/composer/docs/concepts/versioning/composer-versions#new_environments) : latest version have been released May 16, 2019 and embed version 1.10.2.

Waiting for this new version, I see 2 ways to fix your problem :

  • copy/paste fixed versions of BigQueryOperator and BigQueryHook and embed them in your sources to use them, or extend the existing BigQueryHook and override bugged methods. I'm not sure you can patch BigQueryHook directly (no access to those files in Composer environment)
  • templatize your SQL query yourself (and not use query_params)



回答2:


This is definitely a bug with composer (Airflow 1.10.2) we fixed it by pulling down the airflow files from github and patching the bigquery_hook.py file and then referencing the fixed file in bigquery_operator.py (both uploaded to a lib folder), the fixes are:

  1. bigquery_operator.py (line 21)

    from lib.bigquery_hook import BigQueryHook

  2. bigquery_hook.py

    (line 678) (query_params, 'queryParameters', None, list),

    (line 731) if 'useLegacySql' in configuration['query'] and configuration['query']['useLegacySql'] and \

then in your dag, reference the uploaded BQ operator: "from lib.bigquery_operator import BigQueryOperator"




回答3:


Sharing two ways to pass query params in BigQuery operator -

  1. Jinja Templating - In the below query you see '{{ (execution_date - macros.timedelta(hours=1)).strftime('%Y-%m-%d %H:00:00') }}' is the jina tempate which will get resolved at runtime.

    SELECT owner_display_name, title, view_count FROM bigquery-public-data.stackoverflow.posts_questions WHERE creation_date > CAST('{{ (execution_date - macros.timedelta(hours=1)).strftime('%Y-%m-%d %H:00:00') }}' AS TIMESTAMP) ORDER BY view_count DESC LIMIT 100

  2. query_params - for in clause, type would be array and in array type type should be type of the column in big query.

    query_params=[ { 'name': 'DATE_IN_CLAUSE', 'parameterType': { 'type': 'ARRAY','arrayType': { 'type' :'TIMESTAMP'} },'parameterValue': { 'arrayValues': [{ 'value': datetime.utcnow().strftime('%Y-%m-%d %H:00:00') }, { 'value': (datetime.utcnow() - timedelta(hours=1)).strftime('%Y-%m-%d %H:00:00') }] } }, { 'name': 'COUNT', 'parameterType': { 'type': 'INTEGER' },'parameterValue': { 'value': 1 } } ]

    SELECT owner_display_name, title, view_count FROM bigquery-public-data.stackoverflow.posts_questions WHERE creation_date in UNNEST(@DATE_IN_CLAUSE) and view_count > @COUNT ORDER BY view_count DESC LIMIT 100

Note - Upper query and params may not give you results but they will succeed without any error. These examples are just for demonstration of how to pass params.



来源:https://stackoverflow.com/questions/56287061/how-to-pass-query-parameter-to-sql-file-using-bigquery-operator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!