airflow

Airflow: How to SSH and run BashOperator from a different server

核能气质少年 提交于 2019-11-29 02:17:36
问题 Is there a way to ssh to different server and run BashOperator using Airbnb's Airflow? I am trying to run a hive sql command with Airflow but I need to SSH to a different box in order to run the hive shell. My tasks should look like this: SSH to server1 start Hive shell run Hive command Thanks! 回答1: I think that I just figured it out: Create a SSH connection in UI under Admin > Connection. Note: the connection will be deleted if you reset the database In the Python file add the following from

Is there a way to create/modify connections through Airflow API

我是研究僧i 提交于 2019-11-29 01:26:01
问题 Going through Admin -> Connections , we have the ability to create/modify a connection's params, but I'm wondering if I can do the same through API so I can programmatically set the connections airflow.models.Connection seems like it only deals with actually connecting to the instance instead of saving it to the list. It seems like a function that should have been implemented, but I'm not sure where I can find the docs for this specific function. 回答1: Connection is actually a model which you

Unable to start Airflow worker/flower and need clarification on Airflow architecture to confirm that the installation is correct

一曲冷凌霜 提交于 2019-11-28 23:38:57
Running a worker on a different machine results in errors specified below. I have followed the configuration instructions and have sync the dags folder. I would also like to confirm that RabbitMQ and PostgreSQL only needs to be installed on the Airflow core machine and does not need to be installed on the workers (the workers only connect to the core). The specification of the setup is detailed below: Airflow core/server computer Has the following installed: Python 2.7 with airflow (AIRFLOW_HOME = ~/airflow) celery psycogp2 RabbitMQ PostgreSQL Configurations made in airflow.cfg: sql_alchemy

How to run bash script file in Airflow

a 夏天 提交于 2019-11-28 21:27:42
I have a bash script that creates a file (if it does not exist) that I want to run in Airflow, but when I try it fails. How do I do this? #!/bin/bash #create_file.sh file=filename.txt if [ ! -e "$file" ] ; then touch "$file" fi if [ ! -w "$file" ] ; then echo cannot write to $file exit 1 fi and here's how I'm calling it in Airflow: create_command = """ ./scripts/create_file.sh """ t1 = BashOperator( task_id= 'create_file', bash_command=create_command, dag=dag ) lib/python2.7/site-packages/airflow/operators/bash_operator.py", line 83, in execute raise AirflowException("Bash command failed")

Airflow parallelism

烂漫一生 提交于 2019-11-28 16:32:12
the Local Executor spawns new processes while scheduling tasks. Is there a limit to the number of processes it creates. I needed to change it. I need to know what is the difference between scheduler's "max_threads" and "parallelism" in airflow.cfg ? parallelism: not a very descriptive name. The description says it sets the maximum task instances for the airflow installation, which is a bit ambiguous — if I have two hosts running airflow workers, I'd have airflow installed on two hosts, so that should be two installations, but based on context 'per installation' here means 'per Airflow state

Airflow s3 connection using UI

故事扮演 提交于 2019-11-28 16:23:00
I've been trying to use Airflow to schedule a DAG. One of the DAG includes a task which loads data from s3 bucket. For the purpose above I need to setup s3 connection. But UI provided by airflow isn't that intutive ( http://pythonhosted.org/airflow/configuration.html?highlight=connection#connections ). Any one succeeded setting up the s3 connection if so are there any best practices you folks follow? Thanks. Anselmo It's hard to find references, but after digging a bit I was able to make it work. TLDR Create a new connection with the following attributes: Conn Id: my_conn_S3 Conn Type: S3

Apache Airflow scheduler does not trigger DAG at schedule time

牧云@^-^@ 提交于 2019-11-28 14:15:20
When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all. However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards. I am using Airflow version v1.7.1.3 with python 2.7.6. Here goes the DAG code: from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta import time n=time.strftime("%Y,%m,%d") v=datetime.strptime(n,"%Y,%m,%d") default_args = { 'owner': 'airflow', 'depends_on

How to pull xcom value from other task instance in the same DAG run (not the most recent one)?

喜欢而已 提交于 2019-11-28 12:49:46
问题 I have 3 DAG runs: DAGR 1 executed at 2019-02-13 16:00:00 DAGR 2 executed at 2019-02-13 17:00:00 DAGR 3 executed at 2019-02-13 18:00:00 In a task instance X of DAGR 1 I want to get xcom value of task instance Y . I did this: kwargs['task_instance'].xcom_pull(task_ids='Y') I expected to get value of xcom from task instance Y in DAGR 1 . Instead I got from DAGR 3 . From Airflow documentation If xcom_pull is passed a single string for task_ids , then the most recent XCom value from that task is

Airflow 1.10 Installation Failing

不想你离开。 提交于 2019-11-28 11:34:52
问题 I have a working Airflow environment using Airflow version 1.9 that is running on an Amazon EC2-Instance. I need to upgrade to the latest version of Airflow which is 1.10. I have the option of either upgrading from version 1.9 or installing 1.10 freshly on a new server. Airflow version 1.10 is not listed on Pip so I'm installing it from Git via this command, pip-3.6 install git+git://github.com/apache/incubator-airflow.git@v1-10-stable This command successfully installs Airflow version 1.10.

DAGs not clickable on Google Cloud Composer webserver, but working fine on a local Airflow

为君一笑 提交于 2019-11-28 11:15:30
I'm using Google Cloud Composer (managed Airflow on Google Cloud Platform) with image version composer-0.5.3-airflow-1.9.0 and Python 2.7, and I'm facing a weird issue : after importing my DAGs, they are not clickable from the Web UI (and there are no buttons "Trigger DAG", "Graph view", ...), while all works perfectly when running a local Airflow. Even if non usable from the webserver on Composer, my DAGs still exist. I can list them using CLI ( list_dags ), describe them ( list_tasks ) and even trigger them ( trigger_dag ). Minimal example reproducing the issue A minimal example I used to