airflow

Cloud Composer (Airflow) jobs stuck

泪湿孤枕 提交于 2019-12-22 12:22:15
问题 My Cloud Composer managed Airflow got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A) I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown state (I suppose the job of Task A) (snapshot of my Jobs). Besides, it seems that the scheduler is not running since recently deleted DAGs keep appearing in the dashboard Is there a way to kill the jobs or reset the scheduler? Any idea to un-stuck

Airflow k8s operator xcom - Handshake status 403 Forbidden

别等时光非礼了梦想. 提交于 2019-12-22 10:44:46
问题 When I run a docker image using KubernetesPodOperator in Airflow version 1.10 Once the pod finishes the task successfullly, airflow tries to get the xcom value by making a connection to the pod via k8s stream client. Following is the error which I encountered: [2018-12-18 05:29:02,209] {{models.py:1760}} ERROR - (0) Reason: Handshake status 403 Forbidden Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/kubernetes/stream/ws_client.py", line 249, in websocket_call

How do I add a new dag to a running airflow service?

断了今生、忘了曾经 提交于 2019-12-22 10:01:40
问题 I have an airflow service that is currently running as separate docker containers for the webserver and scheduler, both backed by a postgres database. I have the dags synced between the two instances and the dags load appropriately when the services start. However, if I add a new dag to the dag folder (on both containers) while the service is running, the dag gets loaded into the dagbag but show up in the web gui with missing metadata. I can run "airflow initdb" after each update but that

Airflow latency between tasks

落爺英雄遲暮 提交于 2019-12-22 08:34:59
问题 As you can see in the image : airflow is making too much time between tasks execution ? it almost represents 30% of the DAG execution time. I've changed the airflow.cfg file to: job_heartbeat_sec = 1 scheduler_heartbeat_sec = 1 but I still have the same latency rate. Why does it behave this way ? 回答1: It is by design. For instance I use Airflow to perform large workflows where some tasks can take a really long time. Airflow is not meant for tasks that will take seconds to execute, it can be

Cannot access airflow web server via AWS load balancer HTTPS because airflow redirects me to HTTP

泪湿孤枕 提交于 2019-12-22 05:31:53
问题 I have an airflow web server configured at EC2, it listens at port 8080. I have an AWS ALB(application load balancer) in front of the EC2, listen at https 80 (facing internet) and instance target port is facing http 8080. I cannot surf https://< airflow link > from browser because the airflow web server redirects me to http : //< airflow link >/admin, which the ALB does not listen at. If I surf https://< airflow link > /admin/airflow/login?next=%2Fadmin%2F from browser, then I see the login

Airflow File Sensor for sensing files on my local drive

你离开我真会死。 提交于 2019-12-22 05:16:11
问题 does anybody have any idea on FileSensor ? I came through it while i was researching on sensing files on my local directory. The code is as follows: task= FileSensor( task_id="senseFile" filepath="etc/hosts", fs_conn_id='fs_local', _hook=self.hook, dag=self.dag,) I have also set my conn_id and conn type as File (path) and gave the {'path':'mypath'} but even though i set a non existing path or if the file isnt there in the specified path, the task is completed and the dag is successful. The

Apache Airflow : airflow initdb throws ModuleNotFoundError: No module named 'werkzeug.wrappers.json'; 'werkzeug.wrappers' is not a package error

孤人 提交于 2019-12-22 04:52:46
问题 On Ubuntu 18.04 with python 3.6.8, trying to install airflow. When I ran airflow initdb command, below error is thrown Traceback (most recent call last): File "/home/uEnAFpip/.virtualenvs/airflow/bin/airflow", line 21, in <module> from airflow import configuration File "/home/uEnAFpip/.virtualenvs/airflow/lib/python3.6/site-packages/airflow/__init__.py", line 40, in <module> from flask_admin import BaseView File "/home/uEnAFpip/.virtualenvs/airflow/lib/python3.6/site-packages/flask_admin/_

How to run one airflow task and all its dependencies?

我只是一个虾纸丫 提交于 2019-12-22 03:50:51
问题 I suspected that airflow run dag_id task_id execution_date would run all upstream tasks, but it does not. It will simply fail when it sees that not all dependent tasks are run. How can I run a specific task and all its dependencies? I am guessing this is not possible because of an airflow design decision, but is there a way to get around this? 回答1: You can run a task independently by using -i/-I/-A flags along with the run command. But yes the design of airflow does not permit running a

Refreshing dags without web server restart apache airflow

怎甘沉沦 提交于 2019-12-22 02:03:26
问题 Is there any way to reload the jobs without having to restart the server? 回答1: In your airflow.cfg , you've these two configurations to control this behavior: # after how much time a new DAGs should be picked up from the filesystem min_file_process_interval = 0 dag_dir_list_interval = 60 You might have to reload the web-server, scheduler and workers for your new configuration to take effect. 回答2: Dags should be reloaded when you update the associated python file. If they are not, first try to

Airflow BashOperator log doesn't contain full ouput

心不动则不痛 提交于 2019-12-21 18:13:21
问题 I have an issue where the BashOperator is not logging all of the output from wget. It'll log only the first 1-5 lines of the output. I have tried this with only wget as the bash command: tester = BashOperator( task_id = 'testing', bash_command = "wget -N -r -nd --directory-prefix='/tmp/' http://apache.cs.utah.edu/httpcomponents/httpclient/source/httpcomponents-client-4.5.3-src.zip", dag = dag) I've also tried this as part of a longer bash script that has other commands that follow wget.