airflow

Passing arguments to sql template from airflow operator

雨燕双飞 提交于 2020-01-30 08:58:35
问题 If I am using a BigQueryOperator with a SQL Template, how could I pass an argument to the SQL? File : .sql/query.sql SELECT * FROM `dataset.{{ task_instance.variable_for_execution }} File : dag.py BigQueryOperator( task_id='compare_tables', sql='./sql/query.sql', use_legacy_sql=False, dag=dag, ) 回答1: You can pass an argument in params parameter which can be used in the templated field as follows: BigQueryOperator( task_id='', sql='SELECT * FROM `dataset.{{ params.param1 }}', params={ 'param1'

Biqquery: Some rows belong to different partitions rather than destination partition

牧云@^-^@ 提交于 2020-01-25 09:20:28
问题 I am running a Airflow DAG which moves data from GCS to BQ using operator GoogleCloudStorageToBigQueryOperator i am on Airflow version 1.10.2. This task moves data from MySql to BQ(Table partitioned), all this time we were partitioned by Ingestion-time and the incremental load for past three days were working fine when the data was loaded using Airflow DAG. Now we changed the partitioned type to be Date or timestamp on a DATE column from the table, after which we have started getting this

GoogleCloudStorageDownloadOperator “Task exited with return code -6”

耗尽温柔 提交于 2020-01-25 06:46:27
问题 I am new to airflow and I am trying something simple with GoogleCloudStorageDownloadOperator: default_args = { 'start_date': airflow.utils.dates.days_ago(0), 'schedule_interval': None, 'retries': 1, 'retry_delay': timedelta(minutes=5), 'params': { 'work_dir': '/tmp' } } dag = DAG( 'foo', default_args=default_args, description='This is foobar', schedule_interval=timedelta(weeks=1), dagrun_timeout=timedelta(minutes=60)) mock_download = GoogleCloudStorageDownloadOperator( task_id='download-foo

Dag Seems to be missing

我是研究僧i 提交于 2020-01-24 20:59:06
问题 I have a dag which checks for new workflows to be generated (Dynamic DAG) at a regular interval and if found, creates them. (Ref: Dynamic dags not getting added by scheduler ) The above DAG is working and the dynamic DAGs are getting created and listed in the web-server. Two issues here: When clicking on the DAG in web url, it says "DAG seems to be missing" The listed DAGs are not listed using "airflow list_dags" command Error: DAG "app01_user" seems to be missing. The same is for all other

Fetch results from BigQueryOperator in airflow

一世执手 提交于 2020-01-24 20:13:30
问题 I am trying to fetch results from BigQueryOperator using airflow but I could not find a way to do it. I tried calling the next() method in the bq_cursor member (available in 1.10) however it returns None . This is how I tried to do it import datetime import logging from airflow import models from airflow.contrib.operators import bigquery_operator from airflow.operators import python_operator yesterday = datetime.datetime.combine( datetime.datetime.today() - datetime.timedelta(1), datetime

Fetch results from BigQueryOperator in airflow

我是研究僧i 提交于 2020-01-24 20:12:12
问题 I am trying to fetch results from BigQueryOperator using airflow but I could not find a way to do it. I tried calling the next() method in the bq_cursor member (available in 1.10) however it returns None . This is how I tried to do it import datetime import logging from airflow import models from airflow.contrib.operators import bigquery_operator from airflow.operators import python_operator yesterday = datetime.datetime.combine( datetime.datetime.today() - datetime.timedelta(1), datetime

Airflow task running tweepy exits with return code -6

╄→гoц情女王★ 提交于 2020-01-24 19:31:15
问题 I have a simple Airflow DAG which has only one task - stream_from_twitter_to_kafka Here is the code for the DAG: default_args = { "owner": "me", "depends_on_past": False, "start_date": datetime(2020, 1, 20), "email": ["makalaaneesh18@mail.com"], "email_on_failure": False, "email_on_retry": False, "retries": 0, "retry_delay": timedelta(minutes=1), } NO_OF_TWEETS_TO_STREAM = 100 with DAG("stream_from_twitter", catchup=False, default_args=default_args, schedule_interval="@hourly") as dag: task1

How do I setup an Airflow of 2 servers?

不想你离开。 提交于 2020-01-24 18:33:12
问题 Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server. Server B is the new server which would host the metadata database on MySQL. Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server

How do I setup an Airflow of 2 servers?

偶尔善良 提交于 2020-01-24 18:32:21
问题 Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server. Server B is the new server which would host the metadata database on MySQL. Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server

How do I setup an Airflow of 2 servers?

烈酒焚心 提交于 2020-01-24 18:32:01
问题 Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server. Server B is the new server which would host the metadata database on MySQL. Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server