问题
I am scheduling dag in airflow for 10 minutes is not doing anything.
here is my dags code:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date':datetime.now(),
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG('Python_call', default_args=default_args, schedule_interval= '*/10 * * * *')
t1 = BashOperator(
task_id='testairflow',
bash_command='python /var/www/projects/python_airflow/airpy/hello.py',
dag=dag)
and the scheduler log looks like this:
[2018-01-05 14:05:08,536] {jobs.py:351} DagFileProcessor484 INFO - Processing /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py took 2.278 seconds
[2018-01-05 14:05:09,712] {jobs.py:343} DagFileProcessor485 INFO - Started process (PID=29795) to work on /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py
[2018-01-05 14:05:09,715] {jobs.py:534} DagFileProcessor485 ERROR - Cannot use more than 1 thread when using sqlite. Setting max_threads to 1
[2018-01-05 14:05:09,717] {jobs.py:1521} DagFileProcessor485 INFO - Processing file /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py for tasks to queue
[2018-01-05 14:05:09,717] {models.py:167} DagFileProcessor485 INFO - Filling up the DagBag from /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py
[2018-01-05 14:05:10,057] {jobs.py:1535} DagFileProcessor485 INFO - DAG(s) dict_keys(['example_passing_params_via_test_command', 'latest_only_with_trigger', 'example_branch_operator', 'example_subdag_operator', 'latest_only', 'example_skip_dag', 'example_subdag_operator.section-1', 'example_subdag_operator.section-2', 'tutorial', 'example_http_operator', 'example_trigger_controller_dag', 'example_bash_operator', 'example_python_operator', 'test_utils', 'Python_call', 'example_trigger_target_dag', 'example_xcom', 'example_short_circuit_operator', 'example_branch_dop_operator_v3']) retrieved from /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py
[2018-01-05 14:05:12,039] {jobs.py:1169} DagFileProcessor485 INFO - Processing Python_call
[2018-01-05 14:05:12,048] {jobs.py:566} DagFileProcessor485 INFO - Skipping SLA check for <DAG: Python_call> because no tasks in DAG have SLAs
[2018-01-05 14:05:12,060] {models.py:322} DagFileProcessor485 INFO - Finding 'running' jobs without a recent heartbeat
[2018-01-05 14:05:12,061] {models.py:328} DagFileProcessor485 INFO - Failing jobs without heartbeat after 2018-01-05 14:00:12.061146
command line airflow scheduler :
[2018-01-05 14:31:20,496] {dag_processing.py:627} INFO - Started a process (PID: 32222) to generate tasks for /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py - logging into /var/www/projects/python_airflow/airpy/airflow_home/logs/scheduler/2018-01-05/scheduler.py.log
[2018-01-05 14:31:23,122] {jobs.py:1002} INFO - No tasks to send to the executor
[2018-01-05 14:31:23,123] {jobs.py:1440} INFO - Heartbeating the executor
[2018-01-05 14:31:23,123] {jobs.py:1450} INFO - Heartbeating the scheduler
[2018-01-05 14:31:24,243] {jobs.py:1404} INFO - Heartbeating the process manager
[2018-01-05 14:31:24,244] {dag_processing.py:559} INFO - Processor for /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py finished
回答1:
Airflow is an ETL/data pipelining tool. This means its meant to execute things over already "gone by" periods. E.g. using:
- Task parameter
'start_date': datetime(2018,1,4)
- Default dag parameter
schedule_interval='@daily'
Means that the DAG won't run until the whole schedule interval unit (one day) has gone through since the start date; thus on Airflow server time equal to datetime(2018,1,5)
.
Since you have a start_date
of datetime.now()
with a @daily
inteval (which again is the default), the aforementioned condition is never fulfilled (refer to the official FAQ).
You can change the start_date
parameter to, e.g. yesterday using timedelta for a relative start_date
earlier than today (although this is not recommended). I would advise using 'start_date': datetime(2018,1,1)
and adding a scheduler_interval='@once'
to the DAG parameters for test purposes. This should get your DAG to run.
来源:https://stackoverflow.com/questions/48110703/task-scheduling-in-airflow-is-not-working