apache-airflow

How does Airflow's BranchPythonOperator work?

倾然丶 夕夏残阳落幕 提交于 2019-12-14 04:17:17
问题 I'm struggling to understand how BranchPythonOperator in Airflow works. I know it's primarily used for branching, but am confused by the documentation as to what to pass into a task and what I need to pass/expect from the task upstream. Given the simple example in the documentation on this page what would the source code look like for the upstream task called run_this_first and the 2 downstream ones that are branched? How exactly does Airflow know to run branch_a instead of branch_b ? Where

How to install apache airflow from github

↘锁芯ラ 提交于 2019-12-13 13:18:11
问题 Problem : I want to install apache-airflow using the latest version of Apache-Airflow on Github with all the dependencies? How can I do that using pip ? Also is it safe to use that in the production environment? 回答1: Using pip: $ pip install git+https://github.com/apache/incubator-airflow.git@v1-10-stable Yes, it is safe. You will need gcc. 回答2: I generally use this $ pip install git+https://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[async,crypto,celery,kubernetes

How to obtain and process mysql records using Airflow?

北城以北 提交于 2019-12-13 11:55:01
问题 I need to 1. run a select query on MYSQL DB and fetch the records. 2. Records are processed by python script. I am unsure about the way I should proceed. Is xcom the way to go here? Also, MYSQLOperator only executes the query, doesn't fetch the records. Is there any inbuilt transfer operator I can use? How can I use a MYSQL hook here? you may want to use a PythonOperator that uses the hook to get the data, apply transformation and ship the (now scored) rows back some other place. Can someone

Can airflow be used to run a never ending task?

♀尐吖头ヾ 提交于 2019-12-12 23:40:46
问题 Can we use an airflow dag to define a never-ending job (ie. a task which has a unconditional loop to consume stream data) by setting the task/dag timeout to None and manually trigger its running? Would having airflow monitor a never ending task cause a problem? Thanks 回答1: A bit odd to run this through Airflow, but yeah I don't think that's an issue. Just note that if you restart the worker running the job (assuming CeleryExecutor), you'll interrupt the task and need to kick it off manually

Airflow worker is not listening to default rabbitmq queue

隐身守侯 提交于 2019-12-12 10:35:53
问题 I have configured Airflow with rabbitmq broker, the services: airflow worker airflow scheduler airflow webserver are running without any errors. The scheduler is pushing the tasks to execute on default rabbitmq queue: Even I tried airflow worker -q=default - worker still not receiving tasks to run. My airflow.cfg settings file: [core] # The home folder for airflow, default is ~/airflow airflow_home = /home/my_projects/ksaprice_project/airflow # The folder where your airflow pipelines live,

Airflow Scheduler not picking up DAG Runs

孤人 提交于 2019-12-11 07:16:28
问题 I'm setting up airflow such that webserver runs on one machine and scheduler runs on another. Both share the same MySQL metastore database. Both instances come up without any errors in the logs but the scheduler is not picking up any DAG Runs that are created by manually triggering the DAGs via the Web UI. The dag_run table in MysQL shows few entries, all in running state: mysql> select * from dag_run; +----+--------------------------------+----------------------------+---------+-------------

Reversed upstream/downstream relationships when generating multiple tasks in Airflow

て烟熏妆下的殇ゞ 提交于 2019-12-11 07:14:50
问题 The original code related to this question can be found here. I'm confused by up both bitshift operators and set_upstream / set_downstream methods are working within a task loop that I've defined in my DAG. When the main execution loop of the DAG is configured as follows: for uid in dash_workers.get_id_creds(): clear_tables.set_downstream(id_worker(uid)) or for uid in dash_workers.get_id_creds(): clear_tables >> id_worker(uid) The graph looks like this (the alpha-numeric sequence are the user

Airflow DAG not getting scheduled

徘徊边缘 提交于 2019-12-11 04:31:49
问题 I am new to Airflow and created my first DAG. Here is my DAG code. I want the DAG to start now and thereafter run once in a day. from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime.now(), 'email': ['aaaa@gmail.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), } dag = DAG( 'alamode

Dynamically create list of tasks

孤者浪人 提交于 2019-12-11 01:14:16
问题 I have a DAG which is created by querying DynamoDB for a list and for each item in the list a task is created using a PythonOperator and adding it to the DAG. Not show in the example below but it's important to note that some of the items on the list depend upon other tasks so I'm using set_upstream to enforce the dependencies. - airflow_home \- dags \- workflow.py workflow.py def get_task_list(): # ... query dynamodb ... def run_task(task): # ... do stuff ... dag = DAG(dag_id='my_dag', ...)

understanding the tree view in apache airflow

你离开我真会死。 提交于 2019-12-11 01:06:00
问题 I setup the dag from the https://airflow.apache.org/tutorial.html as is, the only change being that I have set the dag to run at an interval of 5 minutes with a start date of 2017-12-17 T13:40:00 UTC. I enabled the dag before 13:40, so there was no backfill and my machine is running on UTC. The dag ran as expected(i.e at an interval of 5 minutes starting at 13:45 UTC) Now, when I go to the tree view, I am failing to understand the graph. There are 3 tasks in total. 'sleep'(t2) has upstream