问题
When using LocalExecutor
with a MySQL
backend, running airflow scheduler
on my Centos 6 box creates 33 scheduler processes, e.g.
deploy 55362 13.5 1.8 574224 73272 ? Sl 18:59 7:42 /usr/local/bin/python2.7 /usr/local/bin/airflow scheduler
deploy 55372 0.0 1.5 567928 60552 ? Sl 18:59 0:00 /usr/local/bin/python2.7 /usr/local/bin/airflow scheduler
deploy 55373 0.0 1.5 567928 60540 ? Sl 18:59 0:00 /usr/local/bin/python2.7 /usr/local/bin/airflow scheduler
...
These are distinct from Executor processes and gunicorn master and worker processes.
Running it with the SequentialExecutor
(sqlite
backend) just kicks off one scheduler process.
Airflow still works (DAGs are getting run), but the sheer number of these processes makes me think something is wrong.
When I run select * from job where state = 'running';
in the database, only 5 SchedulerJob
rows get returned.
Is this normal?
回答1:
Yes this is normal. These are scheduler processes. You can control this using below parameter in airflow.cfg
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32
These are spawned from scheduler whose pid can be found in airflow-scheduler.pid file
so 32+1=33 processes that you are seeing.
Hope this clears out your doubt.
Cheers!
回答2:
As of v1.10.3, this is what I found. My settings are:
parallelism = 32
max_threads = 4
There are a total of
- 1 (main) +
- 32 (executors) +
- 1 (dag_processor_manager) +
- 4 (dag processors)
= 38 processes!
来源:https://stackoverflow.com/questions/42729161/running-airflow-scheduler-launches-33-scheduler-processes