airflow

Unable to start Airflow worker/flower and need clarification on Airflow architecture to confirm that the installation is correct

匿名 (未验证) 提交于 2019-12-03 02:59:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Running a worker on a different machine results in errors specified below. I have followed the configuration instructions and have sync the dags folder. I would also like to confirm that RabbitMQ and PostgreSQL only needs to be installed on the Airflow core machine and does not need to be installed on the workers (the workers only connect to the core). The specification of the setup is detailed below: Airflow core/server computer Has the following installed: Python 2.7 with airflow (AIRFLOW_HOME = ~/airflow) celery psycogp2 RabbitMQ

What is the difference between airflow trigger rule “all_done” and “all_success”?

匿名 (未验证) 提交于 2019-12-03 02:59:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: One of the requirement in the workflow I am working on is to wait for some event to happen for given time, if it does not happen mark the task as failed still the downstream task should be executed. I am wondering if "all_done" means all the dependency tasks are done no matter if they have succeeded or not. 回答1: https://airflow.incubator.apache.org/concepts.html#trigger-rules all_done means all operations have finished working. Maybe they succeeded, maybe not. all_success means all operations have finished without error So your guess is

Airflow mysql to gcp Dag error

匿名 (未验证) 提交于 2019-12-03 02:38:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm recently started working with Airflow. I'm working on DAG that: Queries the MySQL database Extract the query and stores it in a cloud storage bucket as a JSON file Uploads stored JSON file to BigQuery Dag imports three operators: MySqlOperator , MySqlToGoogleCloudStorageOperator and GoogleCloudStorageToBigQueryOperator I am using Airflow 1.8.0, Python 3, and Pandas 0.19.0. Here is my Dag Code: sql2gcp_csv = MySqlToGoogleCloudStorageOperator( task_id='sql2gcp_csv', sql='airflow_gcp/aws_sql_extract_7days.sql', bucket='gs://{{var.value.gcs

Airflow: how to delete a DAG?

匿名 (未验证) 提交于 2019-12-03 02:05:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have started the Airflow webserver and scheduled some dags. I can see the dags on web GUI. How can I delete a particular DAG from being run and shown in web GUI? Is there an Airflow CLI command to do that? I looked around but could not find an answer for a simple way of deleting a DAG once it has been loaded and scheduled. 回答1: This is my adapted code using PostgresHook with the default connection_id. import sys from airflow.hooks.postgres_hook import PostgresHook dag_input = sys.argv[1] hook=PostgresHook( postgres_conn_id= "airflow_db")

Airflow 1.9.0 is queuing but not launching tasks

匿名 (未验证) 提交于 2019-12-03 01:58:03
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Airflow is randomly not running queued tasks some tasks dont even get queued status. I keep seeing below in the scheduler logs [2018-02-28 02:24:58,780] {jobs.py:1077} INFO - No tasks to consider for execution. I do see tasks in database that either have no status or queued status but they never get started. The airflow setup is running https://github.com/puckel/docker-airflow on ECS with Redis. There are 4 scheduler threads and 4 Celery worker tasks. For the tasks that are not running are showing in queued state (grey icon) when hovering

Airflow tasks get stuck at “queued” status and never gets running

匿名 (未验证) 提交于 2019-12-03 01:48:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: (start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … So the start task has multiple downstreams. And I setup concurrency related configuration as below: parallelism = 3 dag_concurrency = 3 max_active_runs = 1 Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others

Airflow tasks get stuck at “queued” status and never gets running

匿名 (未验证) 提交于 2019-12-03 01:48:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: (start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … So the start task has multiple downstreams. And I setup concurrency related configuration as below: parallelism = 3 dag_concurrency = 3 max_active_runs = 1 Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others

Use airflow hive operator and output to a text file

匿名 (未验证) 提交于 2019-12-03 01:39:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Hi I want to execute hive query using airflow hive operator and output the result to a file. I don't want to use INSERT OVERWRITE here. hive_ex = HiveOperator( task_id='hive-ex', hql='/sql/hive-ex.sql', hiveconfs={ 'DAY': '{{ ds }}', 'YESTERDAY': '{{ yesterday_ds }}', 'OUTPUT': '{{ file_path }}'+'csv', }, dag=dag ) What is the best way to do this? I know how to do this using bash operator,but want to know if we can use hive operator hive_ex = BashOperator( task_id='hive-ex', bash_command='hive -f hive.sql -DAY={{ ds }} >> {{ file_path }}

How can I restart the airflow server on Google Composer?

匿名 (未验证) 提交于 2019-12-03 01:39:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: When I need to restart the webserver locally I do: ps - ef | grep airflow | awk '{print $2}' | xargs kill - 9 airflow webserver - p 8080 - D How can I do this on Google Composer? I don't see an option to restart the server in the console. 回答1: Since Cloud Composer is an Apache Airflow managed service , it is not possible to restart the whole service. You can restart though the single instances of the service, as described here , but this will not help to apply to the plugin changes. To apply the plugin changes, you should install

Google Cloud Composer and Google Cloud SQL

匿名 (未验证) 提交于 2019-12-03 01:38:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud Storage). Can the Cloud SQL proxy be exposed in some way on pods part the Kubernetes cluster hosting Composer? If not can the Cloud SQL Proxy be brought in by using the Kubernetes Service Broker? -> https://cloud.google.com/kubernetes-engine/docs/concepts/add-on/service-broker Should Airflow be