airflow

Airflow - Disable heartbeat logs

家住魔仙堡 提交于 2020-01-22 21:29:07
问题 My logs are getting completely flooded with useless messages for every heartbeat. [2019-11-27 21:32:47,890] {{logging_mixin.py:112}} INFO - [2019-11-27 21:32:47,889] {local_task_job.py:124} WARNING - Time since last heartbeat(0.02 s) < heartrate(5.0 s), sleeping for 4.983326 s [2019-11-27 21:32:52,921] {{logging_mixin.py:112}} INFO - [2019-11-27 21:32:52,921] {local_task_job.py:124} WARNING - Time since last heartbeat(0.02 s) < heartrate(5.0 s), sleeping for 4.984673 s [2019-11-27 21:32:57

Airflow BigQueryOperator: how to save query result in a partitioned Table?

不想你离开。 提交于 2020-01-22 12:47:33
问题 I have a simple DAG from airflow import DAG from airflow.contrib.operators.bigquery_operator import BigQueryOperator with DAG(dag_id='my_dags.my_dag') as dag: start = DummyOperator(task_id='start') end = DummyOperator(task_id='end') sql = """ SELECT * FROM 'another_dataset.another_table' """ bq_query = BigQueryOperator(bql=sql, destination_dataset_table='my_dataset.my_table20180524'), task_id='bq_query', bigquery_conn_id='my_bq_connection', use_legacy_sql=False, write_disposition='WRITE

Airflow BigQueryOperator: how to save query result in a partitioned Table?

∥☆過路亽.° 提交于 2020-01-22 12:47:07
问题 I have a simple DAG from airflow import DAG from airflow.contrib.operators.bigquery_operator import BigQueryOperator with DAG(dag_id='my_dags.my_dag') as dag: start = DummyOperator(task_id='start') end = DummyOperator(task_id='end') sql = """ SELECT * FROM 'another_dataset.another_table' """ bq_query = BigQueryOperator(bql=sql, destination_dataset_table='my_dataset.my_table20180524'), task_id='bq_query', bigquery_conn_id='my_bq_connection', use_legacy_sql=False, write_disposition='WRITE

How to Run a Simple Airflow DAG

本秂侑毒 提交于 2020-01-22 10:42:06
问题 I am totally new to Airflow. I would like to run a simple DAG at a specified date. I'm struggling to make difference between the start date, the execution date, and backfilling. And what is the command to run the DAG? Here is what I've tried since: airflow run dag_1 task_1 2017-1-23 The first time I ran that command, the task executed correctly, but when I tried again it did not work. Here is another command I ran: airflow backfill dag_1 -s 2017-1-23 -e 2017-1-24 I don't know what to expect

How to use apache airflow in a virtual environment?

对着背影说爱祢 提交于 2020-01-17 01:05:52
问题 I am quite new to using apache airflow. I use pycharm as my IDE. I create a project (anaconda environment), create a python script that includes DAG definitions and Bash operators. When I open my airflow webserver, my DAGS are not shown. Only the default example DAGs are shown. My AIRFLOW_HOME variable contains ~/airflow . So i stored my python script there and now it shows. How do I use this in a project environment? Do I change the environment variable at the start of every project? Is

How to use apache airflow in a virtual environment?

瘦欲@ 提交于 2020-01-17 01:05:43
问题 I am quite new to using apache airflow. I use pycharm as my IDE. I create a project (anaconda environment), create a python script that includes DAG definitions and Bash operators. When I open my airflow webserver, my DAGS are not shown. Only the default example DAGs are shown. My AIRFLOW_HOME variable contains ~/airflow . So i stored my python script there and now it shows. How do I use this in a project environment? Do I change the environment variable at the start of every project? Is

Airflow unable to iterate through xcom_pull list with GoogleCloud Operatos

假装没事ソ 提交于 2020-01-16 09:07:56
问题 I would like to dynamically get the list of csv files on gcs bucket and then dump each one to a corresponding BQ table. I am using GoogleCloudStorageListOperator and GoogleCloudStorageToBigQueryOperator operators GCS_Files = GoogleCloudStorageListOperator( task_id='GCS_Files', bucket=cf.storage.import_bucket_name, prefix='20190701/', delimiter='.csv', dag=dag ) for idx, elem in enumerate(["{{ task_instance.xcom_pull(task_ids='GCS_Files') }}"]): storage_to_bigquery =

Airflow get stacked in last task of DAG after several executions

吃可爱长大的小学妹 提交于 2020-01-16 08:59:40
问题 I have a DAG which is formed by 7 tasks. I have executed it many times but lately it is getting stuck in the last task, which is a very simple python operator as follow: def send_email(warnings): warnings = ast.literal_eval(warnings) warnings_list = '\n'.join(warnings) email_message =f"""Good morning, the past week there were some performance issues, which were the following ones: \n {warnings_list} Have a nice day!""" send_email_smtp(to = 'email@email.com', subject = 'Warning', html_content

execution_date jinja resolving as a string

怎甘沉沦 提交于 2020-01-16 08:59:24
问题 I have an airflow dag that uses the following jinja template: "{{ execution_date.astimezone('Etc/GMT+6').subtract(days=1).strftime('%Y-%m-%dT00:00:00') }}" This template works in other dags, and it works when the schedule_interval for the dag is set to timedelta(hours=1) . However, when we set the schedule interval to 0 8 * * * , it throws the following traceback at runtime: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 1426, in

Using XCom to Load Schema in Airflow with GoogleCloudStorageToBigQueryOperator

笑着哭i 提交于 2020-01-15 10:15:37
问题 I have an XCom associated with the Task ID database_schema stored in Airflow that is the JSON schema for a dataset sales_table that I want to load into BigQuery. The data for the BigQuery dataset sales_table comes from a CSV file retailcustomer_data.csv stored in Google Cloud Storage. The operator for loading the data from GCS to BigQuery is as follows: gcs_to_bigquery = GoogleCloudStorageToBigQueryOperator(task_id = 'gcs_to_bigquery', bucket = bucket, source_objects = ['retailcustomer_data