airflow

Airbnb Airflow using all system resources

萝らか妹 提交于 2020-05-25 03:23:46
问题 We've set up Airbnb/Apache Airflow for our ETL using LocalExecutor , and as we've started building more complex DAGs, we've noticed that Airflow has starting using up incredible amounts of system resources. This is surprising to us because we mostly use Airflow to orchestrate tasks that happen on other servers, so Airflow DAGs spend most of their time waiting for them to complete--there's no actual execution that happens locally. The biggest issue is that Airflow seems to use up 100% of CPU

Airbnb Airflow using all system resources

拈花ヽ惹草 提交于 2020-05-25 03:23:33
问题 We've set up Airbnb/Apache Airflow for our ETL using LocalExecutor , and as we've started building more complex DAGs, we've noticed that Airflow has starting using up incredible amounts of system resources. This is surprising to us because we mostly use Airflow to orchestrate tasks that happen on other servers, so Airflow DAGs spend most of their time waiting for them to complete--there's no actual execution that happens locally. The biggest issue is that Airflow seems to use up 100% of CPU

How to stop DAG from backfilling? catchup_by_default=False and catchup=False does not seem to work and Airflow Scheduler from backfilling

生来就可爱ヽ(ⅴ<●) 提交于 2020-05-23 17:49:13
问题 The setting catchup_by_default=False in airflow.cfg does not seem to work. Also adding catchup=False to the DAG doesn't work neither. Here's how to reproduce the issue. I always start from a clean slate by running airflow resetdb . As soon as I unpause the dag, the tasks start to backfill. Here's the setup for the dag. I'm just using the tutorial example. default_args = { "owner": "airflow", "depends_on_past": False, "start_date": datetime(2018, 9, 16), "email": ["airflow@airflow.com"],

Starting Airflow webserver fails with sqlalchemy.exc.NoInspectionAvailable: No inspection system is available

跟風遠走 提交于 2020-05-22 19:44:18
问题 Installation done properly. db initiated properly and trying to start the webserver shows the following error. I reinstalled everything but its still not working. I will appreciate if anyone help me. Console output: $:~/airflow# airflow webserver -p 8080 ____________ _____________ ____ |__( )_________ __/__ /________ __ ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ [2020-04-08 13:14:20,573] {__init__.py:51} INFO

New Airflow is made on its own(airflow's default directory) other than the airflow that i installed in a specific directory. What is happening?

流过昼夜 提交于 2020-05-17 08:49:15
问题 I've installed airflow in a virtual env so prior to installing it, I specified a directory in which the install will occur. Installation is done everything works fine without any issues(its the same terminal that i used for installation) but if i open a new terminal , activate the env and run any airflow command what happens is,a new airflow get generated out of no where in its default airflow location so the airflow commands no longer access my airflow and rather access this new one . Even

Airflow writes and reads from S3 successfully but won't load S3 logs on docker-compose up

老子叫甜甜 提交于 2020-05-17 07:39:09
问题 I'm using puckle's airflow docker (github link) with docker-compose-LocalExecutor. The project is deployed through CI/CD on EC2 instance so my airflow doesn't run on a persistent server. ( Every push on master it gets launched afresh ). I know i'm losing some great features but in my setup everything is configured by bash script and/or enviroment variables. My setup is similiar to this answer setup: Similar setup answer I'm running on version 1.10.6 , so the old method of adding config/__init

Retrieve full connection URI from Airflow Postgres hook

不打扰是莪最后的温柔 提交于 2020-05-16 22:03:17
问题 Is there a neater way to get the complete URI from a Postgres hook? .get_uri() doesn't include the "extra" params so I am appending them like this: def pg_conn_id_to_uri(postgres_conn_id): hook = PostgresHook(postgres_conn_id) uri = hook.get_uri() extra = hook.get_connection(postgres_conn_id).extra_dejson params = [ f'{k}={v}' for k, v in extra.items() ] if params: params = '&'.join(params) uri += f'?{params}' return uri 回答1: If cleaner doesn't necessarily imply for brevity here, then here's

Airflow dynamic dag creation

穿精又带淫゛_ 提交于 2020-05-16 03:10:51
问题 Someone please tell me whether a DAG in airflow is just a graph (like a placeholder) without any actual data (like arguments) associated with it OR a DAG is like an instance (for a fixed argument)? I want a system where the set of operations to perform (given a set of arguments) are fixed. But this input will be different everytime the set of operations are run. In simple terms, the pipeline is the same but the arguments to the pipeline will be different everytime it is run. I want to know

airflow initdb: ImportError: cannot import name 'HTMLString'

夙愿已清 提交于 2020-05-16 02:45:08
问题 I'm getting ImportError: cannot import name 'HTMLString' on running airflow initdb File "/home/ubuntu/airflow_env/bin/airflow", line 26, in <module> from airflow.bin.cli import CLIFactory File "/home/ubuntu/airflow_env/lib/python3.6/site-packages/airflow/bin/cli.py", line 71, in <module> from airflow.www_rbac.app import cached_app as cached_app_rbac File "/home/ubuntu/airflow_env/lib/python3.6/site-packages/airflow/www_rbac/app.py", line 27, in <module> from flask_appbuilder import AppBuilder

How to export large data from Postgres to S3 using Cloud composer?

廉价感情. 提交于 2020-05-15 18:34:06
问题 I have been using the Postgres to S3 operator to load data from Postgres to S3. But recently, I had to export a very large table and my Airflow composer fails without any logs, this could be because we are using the NamedTemporaryFile function of Python's tempfile module to create a temporary file and we are using this temporary file to load to S3. Since we are using Composer, this will be loaded to Composer's local memory, and since the size of the file is very large, it is failing. Refer