Status of Airflow task within the dag

只谈情不闲聊 提交于 2019-12-09 09:49:48

问题


I need the status of the task like if it is running or upforretry or failed within the same dag. So i tried to get it using the below code, though i got no output...

Auto = PythonOperator(
    task_id='test_sleep',
    python_callable=execute_on_emr,
    op_kwargs={'cmd':'python /home/hadoop/test/testsleep.py'},
    dag=dag)

logger.info(Auto)

The intention is to kill certain running tasks once a particular task on airflow completes.

Question is how do i get the state of a task like is it in the running state or failed or success


回答1:


I am doing something similar. I need to check for one task if the previous 10 runs of another task were successful. taky2 sent me on the right path. It is actually fairly easy:

from airflow.models import TaskInstance
ti = TaskInstance(*your_task*, execution_date)
state = ti.current_state()

As I want to check that within the dag, it is not neccessary to specify the dag. I simply created a function to loop through the past n_days and check the status.

def check_status(**kwargs):
    last_n_days = 10
    for n in range(0,last_n_days):
        date = kwargs['execution_date']- timedelta(n)
        ti = TaskInstance(*my_task*, date) #my_task is the task you defined within the DAG rather than the task_id (as in the example below: check_success_task rather than 'check_success_days_before') 
        state = ti.current_state()
        if state != 'success':
            raise ValueError('Not all previous tasks successfully completed.')

When you call the function make sure to set provide_context.

check_success_task = PythonOperator(
    task_id='check_success_days_before',
    python_callable= check_status,
    provide_context=True,
    dag=dag
)

UPDATE: When you want to call a task from another dag, you need to call it like this:

from airflow import configuration as conf
from airflow.models import DagBag, TaskInstance

dag_folder = conf.get('core','DAGS_FOLDER')
dagbag = DagBag(dag_folder)
check_dag = dagbag.dags[*my_dag_id*]
my_task = check_dag.get_task(*my_task_id*)
ti = TaskInstance(my_task, date)



回答2:


Okay, I think I know what you're doing and I don't really agree with it, but I'll start with an answer.

A straightforward, but hackish, way would be to query the task_instance table. I'm in postgres, but the structure should be the same. Start by grabbing the task_ids and state of the task you're interested in with a db call.

SELECT task_id, state
FROM task_instance
WHERE dag_id = '<dag_id_attrib>'
  AND execution_date = '<execution_date_attrib>'
  AND task_id = '<task_to_check>'

That should give you the state (and name, for reference) of the task you're trying to monitor. State is stored as a simple lowercase string.




回答3:


Take a look at the code responsible for the command line interface operation suggested by Priyank.

https://github.com/apache/incubator-airflow/blob/2318cea74d4f71fba353eaca9bb3c4fd3cdb06c0/airflow/bin/cli.py#L581

def task_state(args):
    dag = get_dag(args)
    task = dag.get_task(task_id=args.task_id)
    ti = TaskInstance(task, args.execution_date)
    print(ti.current_state())

Hence, it seem you should easily be able to accomplish this within your DAG codebase using similar code.

Alternatively you could execute these CLI operations from within your code using python's subprocess library.




回答4:


You can use the command line Interface for this:

 airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date

For more on this you can refer official airflow documentation:

http://airflow.incubator.apache.org/cli.html



来源:https://stackoverflow.com/questions/43732642/status-of-airflow-task-within-the-dag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!