Airflow BigQueryOperator: how to save query result in a partitioned Table?

后端 未结 4 2043
长情又很酷
长情又很酷 2021-01-01 00:23

I have a simple DAG

from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator

with DAG(dag_id=\'my_dags.my_dag\') as         


        
4条回答
  •  再見小時候
    2021-01-01 01:12

    The main issue here is that I don't have access to the new version of google cloud python API, the prod is using version 0.27.0. So, to get the job done, I made something bad and dirty:

    • saved the query result in a sharded table, let it be table_sharded
    • got table_sharded's schema, let it be table_schema
    • saved " SELECT * FROM dataset.table_sharded" query to a partitioned table providing table_schema

    All this is abstracted in one single operator that uses a hook. The hook is responsible of creating/deleting tables/partitions, getting table schema and running queries on BigQuery.

    Have a look at the code. If there is any other solution, please let me know.

提交回复
热议问题