airflow

What is the difference between min_file_process_interval and dag_dir_list_interval in Apache Airflow 1.9.0?

家住魔仙堡 提交于 2019-12-21 05:11:10
问题 We are using Airflow v 1.9.0. We have 100+ dags and the instance is really slow. The scheduler is only launching some tasks. In order to reduce the amount of CPU usage, we want to tweak some configuration parameters, namely: min_file_process_interval and dag_dir_list_interval . The documentation is not really clear about the difference between the two 回答1: min_file_process_interval : In cases where there are only a small number of DAG definition files, the loop could potentially process the

configuring Airflow to work with CeleryExecutor

拈花ヽ惹草 提交于 2019-12-21 04:11:31
问题 I try to configure Airbnb AirFlow to use the CeleryExecutor like this: I changed the executer in the airflow.cfg from SequentialExecutor to CeleryExecutor : # The executor class that airflow should use. Choices include # SequentialExecutor, LocalExecutor, CeleryExecutor executor = CeleryExecutor But I get the following error: airflow.configuration.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor Note that the sql_alchemy_conn is configured like this: sql_alchemy_conn =

Debugging Broken DAGs

混江龙づ霸主 提交于 2019-12-20 20:37:23
问题 When the airflow webserver shows up errors like Broken DAG: [<path/to/dag>] <error> , how and where can we find the full stacktrace for these exceptions? I tried these locations: /var/log/airflow/webserver -- had no logs in the timeframe of execution, other logs were in binary and decoding with strings gave no useful information. /var/log/airflow/scheduler -- had some logs but were in binary form, tried to read them and looked to be mostly sqlalchemy logs probably for airflow's database. /var

Store and access password using Apache airflow

流过昼夜 提交于 2019-12-20 17:29:26
问题 We are using airflow as a scheduler. I want to invoke a simple bash operator in a DAG. The bash script needs password as an argument to do further processing. How can I store a password securely in airflow (config/variables/connection) and access it in dag definition file. I am new to airflow and Python so a code snippet will be appreciated. 回答1: You can store the password in a Hook - this will be encrypted so long as you have setup your fernet key. Here is how you can create a connection.

Store and access password using Apache airflow

瘦欲@ 提交于 2019-12-20 17:29:16
问题 We are using airflow as a scheduler. I want to invoke a simple bash operator in a DAG. The bash script needs password as an argument to do further processing. How can I store a password securely in airflow (config/variables/connection) and access it in dag definition file. I am new to airflow and Python so a code snippet will be appreciated. 回答1: You can store the password in a Hook - this will be encrypted so long as you have setup your fernet key. Here is how you can create a connection.

How do I restart airflow webserver?

旧巷老猫 提交于 2019-12-20 08:48:16
问题 I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command aiflow webserver -p 8080 -D True Server running successfully in backend. Now I want to enable authentication in airflow and done configuration changes in airflow.cfg, but authentication functionality is not reflected in server. when I stop and start airflow server in my local machine it works. So How can I restart my daemon airflow

Airflow Scheduler Misunderstanding

試著忘記壹切 提交于 2019-12-20 06:25:32
问题 I'm new to Airflow. My goal is to run a dag, on a daily basis, starting 1 hour from now. I'm truly misunderstanding the airflow schedule "end-of-interval invoke" rules. From the docs [(Airflow Docs)][1] Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended. I set schedule_interval as followed: schedule_interval="00 15 * * *" and start

Airflow apply_defaults decorator reports Argument is required

谁说胖子不能爱 提交于 2019-12-20 05:28:11
问题 I recently ran into this nasty error where Airflow's apply_defaults decorator is throwing following stack-trace ( my **kwargs do contain job_flow_id ) File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/mysql_import_dag.py", line 23, in <module> sync_dag_builder.build_sync_dag() File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/builders/sync_dag_builders/emr_sync_dag_builder.py",

Airflow 1.10 - Scheduler Startup Fails

旧街凉风 提交于 2019-12-20 02:19:07
问题 I've just painfully installed Airflow 1.10 thanks to my previous post here. We have a single ec2-instance running, our queue is AWS Elastic Cache Redis, and our meta database is AWS RDS for PostgreSQL. Airflow works with this setup just fine when we are on Airflow version 1.9. But we are encountering an issue on Airflow version 1.10 when we go to start up the scheduler. [2018-08-15 16:29:14,015] {jobs.py:385} INFO - Started process (PID=15778) to work on /home/ec2-user/airflow/dags/myDag.py

Airflow Worker Daemon exits for no visible reason

故事扮演 提交于 2019-12-20 01:43:09
问题 I have Airflow 1.9 running inside a virtual environment, set up with Celery and Redis and it works well. However, I wanted to daemon-ize the set up and used the instructions here. It works well for the Webserver, Scheduler and Flower, but fails for the Worker, which is of course, the core of it all. My airflow-worker.service file looks like this: [Unit] Description=Airflow celery worker daemon After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service Wants