airflow

AirFlow dag id access in sub-tasks

我们两清 提交于 2019-12-13 14:21:00
问题 I have a DAG with three bash tasks which is scheduled to run every day. I would like to access unique ID of dag instance(may be PID) in all bash scripts. Is there any way to do this? I am looking for similar functionality as Oozie where we can access WORKFLOW_ID in workflow xml or java code. Can somebody point me to documentation of AirFlow on "How to use in-build and custom variables in AirFlow DAG" Many Thanks Pari 回答1: Object's attributes can be accessed with dot notation in jinja2 (see

How to install apache airflow from github

↘锁芯ラ 提交于 2019-12-13 13:18:11
问题 Problem : I want to install apache-airflow using the latest version of Apache-Airflow on Github with all the dependencies? How can I do that using pip ? Also is it safe to use that in the production environment? 回答1: Using pip: $ pip install git+https://github.com/apache/incubator-airflow.git@v1-10-stable Yes, it is safe. You will need gcc. 回答2: I generally use this $ pip install git+https://github.com/apache/incubator-airflow.git@v1-10-stable#egg=apache-airflow[async,crypto,celery,kubernetes

How to obtain and process mysql records using Airflow?

北城以北 提交于 2019-12-13 11:55:01
问题 I need to 1. run a select query on MYSQL DB and fetch the records. 2. Records are processed by python script. I am unsure about the way I should proceed. Is xcom the way to go here? Also, MYSQLOperator only executes the query, doesn't fetch the records. Is there any inbuilt transfer operator I can use? How can I use a MYSQL hook here? you may want to use a PythonOperator that uses the hook to get the data, apply transformation and ship the (now scored) rows back some other place. Can someone

Getting error when trying to install apache-airflow on Mac. How can I fix this?

房东的猫 提交于 2019-12-13 11:16:02
问题 Error output below: ronakvora:dtc ronakvora$ pip install apache-airflow Installing build dependencies ... done Complete output from command python setup.py egg_info: running egg_info creating pip-egg-info/pendulum.egg-info writing requirements to pip-egg-info/pendulum.egg-info/requires.txt writing pip-egg-info/pendulum.egg-info/PKG-INFO writing top-level names to pip-egg-info/pendulum.egg-info/top_level.txt writing dependency_links to pip-egg-info/pendulum.egg-info/dependency_links.txt

Listing variables from external script python without the built-in and imported variables

為{幸葍}努か 提交于 2019-12-13 03:47:55
问题 I'm writing a python script that gets parameters from a json file, opens a template script specified by one of the parameters (from a set of options), and generates a new python script replacing some of the template script with the parameters given in the json. I'm currently trying to list all variables from the template as follows: list = [item for item in dir(imported_script) if not item.startswith("__")] So I can use the list to iterate over the variables and write them in the new script.

Execute Airflow DAG instances (tasks) on a list of specific dates

孤者浪人 提交于 2019-12-13 03:46:48
问题 I would like to manage a couple of future releases using Apache airflow. All of these releases are known way in advance and I need to make sure some data pushing won't be forgotten. The problem is that those future release do not follow a simple periodic schedule that could be handled with a classic cron like 0 1 23 * * or something like @monthly . It's rather 2019-08-24 , 2019-09-30 , 2019-10-20 ... Is there another way but to create a seperate mydag.py file for all of those future releases?

Rabbitmq /usr/local/etc/rabbitmq/rabbitmq-env.conf Missing

痴心易碎 提交于 2019-12-13 03:38:39
问题 I just installed RabbitMQ on an AWS EC2-Instance (CentOS) using the following, sudo yum install erlang sudo yum install rabbitmq-server I was then able to successfully turn it on using, sudo chkconfig rabbitmq-server on sudo /sbin/service rabbitmq-server start ...and sudo /sbin/service rabbitmq-server stop sudo sudo rabbitmq-server run in foreground; But now I'm trying to modify the /usr/local/etc/rabbitmq/rabbitmq-env.conf file so I can change the NODE_IP_ADDRESS but the file is no where to

Problems in making database requests in airflow

一笑奈何 提交于 2019-12-13 03:31:02
问题 I am trying to create tasks dynamically based on response of a database call. But when I do this the run option just don't come in Airflow, so I cant run. Here s the code: tables = ['a','b','c'] // This works #tables = get_tables() // This never works check_x = python_operator.PythonOperator(task_id="verify_loaded", python_callable = lambda: verify_loaded(tables) ) bridge = DummyOperator( task_id='bridge' ) check_x >> bridge for vname in tables: sql = ("SELECT * FROM `asd.temp.{table}` LIMIT

Is it possible to turn off the VM´s hosting the google-cloud-composer at certain hours?

谁说我不能喝 提交于 2019-12-13 03:25:05
问题 In order to reduce the billing associated of running the google-cloud-composer, I am wondering about the possibility to turn off the VM instances that run the Virtual Environment at certain hours. For example: Most of our DAG´s run either in the morning or the afternoon, so we would like to turn off the VM´s during the night, or even during mid-day if is it possible. I know we can disable the environments manually from the Google cloud console, but it would be great to find a way to do this

Apache Airflow does not pickle DAGs

不羁岁月 提交于 2019-12-13 03:16:27
问题 I would like to recover DAG objects so that I can better inspect certain dependencies after DAG runs (e.g. what data is consumed by specific operators). I am using postgres:9.6 as metadata database backend. This seems to be supported via the donot_pickle configuration variable, which by default indicates all DAGs must be pickled: [core] # Whether to disable pickling dags donot_pickle = False I have some test DAGs (3) available but their corresponding pickle_id is empty: > select pickle_id