airflow

How to nest an Airflow DAG dynamically?

流过昼夜 提交于 2019-11-30 22:18:21
I have a simple DAG of three operators. The first one is PythonOperator with our own functionality, the other two are standard operators from airflow.contrib ( FileToGoogleCloudStorageOperator and GoogleCloudStorageToBigQueryOperator to be precise). They work in sequence. Our custom task produces a number of files, typically between 2 and 5, depending on the parameters. All of these files have to be processed by subsequent tasks separately. That means I want several downstream branches, but it's unknowable how many exactly before the DAG is run. How would you approach this problem? UPDATE:

Can't import Airflow plugins

孤街浪徒 提交于 2019-11-30 21:36:06
问题 Following Airflow tutorial here. Problem : The webserver returns the following error Broken DAG: [/usr/local/airflow/dags/test_operator.py] cannot import name MyFirstOperator Notes: The directory structure looks like this: airflow_home ├── airflow.cfg ├── airflow.db ├── dags │ └── test_operators.py ├── plugins │ └── my_operators.py └── unittests.cfg I am attempting to import the plugin in 'test_operators.py' like this: from airflow.operators import MyFirstOperator The code is all the same as

How to pass parameter to PythonOperator in Airflow

纵然是瞬间 提交于 2019-11-30 21:32:06
I just started using Airflow , can anyone enlighten me how to pass a parameter into PythonOperator like below: t5_send_notification = PythonOperator( task_id='t5_send_notification', provide_context=True, python_callable=SendEmail, op_kwargs=None, #op_kwargs=(key1='value1', key2='value2'), dag=dag, ) def SendEmail(**kwargs): msg = MIMEText("The pipeline for client1 is completed, please check.") msg['Subject'] = "xxxx" msg['From'] = "xxxx" ...... s = smtplib.SMTP('localhost') s.send_message(msg) s.quit() I would like to be able to pass some parameters into the t5_send_notification 's callable

Airflow: Why is there a start_date for operators?

故事扮演 提交于 2019-11-30 20:31:58
I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice? Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018 0.00 am with my cron expression for schedule interval being 30 9 * * * (daily at 9.30 am, i.e expecting to run in next 1 hour). Will my DAG run today at 9.30 am or tomorrow (8th Feb at 9.30 am )? Regarding start_date on task instance, personally I have never used this, I always just have a single DAG start_date. However from what I can see this would

Jobs not executing via Airflow that runs celery with RabbitMQ

淺唱寂寞╮ 提交于 2019-11-30 18:35:21
问题 Below is the config im using [core] # The home folder for airflow, default is ~/airflow airflow_home = /root/airflow # The folder where your airflow pipelines live, most likely a # subfolder in a code repository dags_folder = /root/airflow/dags # The folder where airflow should store its log files. This location base_log_folder = /root/airflow/logs # An S3 location can be provided for log backups # For S3, use the full URL to the base folder (starting with "s3://...") s3_log_folder = None #

Broken DAG: (…) No module named docker

醉酒当歌 提交于 2019-11-30 17:07:53
问题 I have BigQuery connectors all running, but I have some existing scripts in Docker containers I wish to schedule on Cloud Composer instead of App Engine Flexible. I have the below script that seems to follow the examples I can find: import datetime from airflow import DAG from airflow import models from airflow.operators.docker_operator import DockerOperator yesterday = datetime.datetime.combine( datetime.datetime.today() - datetime.timedelta(1), datetime.datetime.min.time()) default_args = {

Airflow failed slack message

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-30 16:26:40
How can I configure Airflow so that any failure in the DAG will (immediately) result in a slack message? At this moment I manage it by creating a slack_failed_task: slack_failed_task = SlackAPIPostOperator( task_id='slack_failed', channel="#datalabs", trigger_rule='one_failed', token="...", text = ':red_circle: DAG Failed', icon_url = 'http://airbnb.io/img/projects/airflow3.png', dag=dag) And set this task (one_failed) upstream from each other task in the DAG: slack_failed_task << download_task_a slack_failed_task << download_task_b slack_failed_task << process_task_c slack_failed_task <<

How to set up Airflow Send Email?

ぐ巨炮叔叔 提交于 2019-11-30 14:44:47
问题 I followed online tutorial to set up Email SMTP server in airflow.cfg as below: [email] email_backend = airflow.utils.email.send_email_smtp [smtp] # If you want airflow to send emails on retries, failure, and you want to use # the airflow.utils.email.send_email_smtp function, you have to configure an # smtp server here smtp_host = smtp.gmail.com smtp_starttls = True smtp_ssl = False # Uncomment and set the user/pass settings if you want to use SMTP AUTH # smtp_user = # smtp_password = smtp

How to set up Airflow Send Email?

安稳与你 提交于 2019-11-30 13:08:56
I followed online tutorial to set up Email SMTP server in airflow.cfg as below: [email] email_backend = airflow.utils.email.send_email_smtp [smtp] # If you want airflow to send emails on retries, failure, and you want to use # the airflow.utils.email.send_email_smtp function, you have to configure an # smtp server here smtp_host = smtp.gmail.com smtp_starttls = True smtp_ssl = False # Uncomment and set the user/pass settings if you want to use SMTP AUTH # smtp_user = # smtp_password = smtp_port = 587 smtp_mail_from = myemail@gmail.com And my DAG is as below: from datetime import datetime from

Assigning tasks to specific machines with airflow

放肆的年华 提交于 2019-11-30 13:05:32
I'm new to Airflow. I have a DAG which contains a task that should run on a specific machine (EMR cluster in my case). How can I tell airflow where to run specific tasks so that every time it will run it will do so on that machine only? Run your worker on that machine with a queue name. In the airflow cli you could do something like: airflow worker -q my_queue Then define that task to use that queue: task = PythonOperator( task_id='task', python_callable=my_callable, queue='my_queue', dag=dag) 来源: https://stackoverflow.com/questions/43186335/assigning-tasks-to-specific-machines-with-airflow