Dataproc hive operator not running hql file stored in storage bucket

前端 未结 2 1046
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-19 10:54

I am trying to run hql file present in cloud storage using airflow script,there are two parameters through which we can pass the path to DataprocHiveOperator :

相关标签:
2条回答
  • 2021-01-19 11:16

    That's because you are using both query as well as query_uri.

    If you are querying using a file, you have to use query_uri and query = None OR you can ignore writing query.

    If you are using inline query then you have to use query.

    Here is a sample for querying through a file.

    HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
        gcp_conn_id='google_cloud_default', 
        queri_ury="gs://us-central1-bucket/data/sample_hql.sql",
        cluster_name='cluster-name',
        region='us-central1',
        dag=dag)
    
    0 讨论(0)
  • 2021-01-19 11:25

    query_uri is the indeed the correct parameter for running an hql file off of Cloud Storage. However, it was only added to the DataProcHiveOperator in https://github.com/apache/incubator-airflow/pull/2402. Based on the warning message you got, I don't think you're running on code that supports the parameter. The change is not on the latest release (1.8.2), so you'll need to wait for another release or grab it off the master branch.

    0 讨论(0)
提交回复
热议问题