airflow

How to retrieve a value from Airflow XCom pushed via SSHExecuteOperator

ⅰ亾dé卋堺 提交于 2019-12-09 13:35:12
问题 I have the following DAG with two SSHExecuteOperator tasks. The first task executes a stored procedure which returns a parameter. The second task needs this parameter as an input. Could please explain how to pull the value from the XCom pushed in task1, in order to use it in task2? from airflow import DAG from datetime import datetime, timedelta from airflow.contrib.hooks.ssh_hook import SSHHook from airflow.contrib.operators.ssh_execute_operator import SSHExecuteOperator from airflow.models

Airflow - Task Instance in EMR operator

别等时光非礼了梦想. 提交于 2019-12-09 11:32:49
问题 In Airflow, I'm facing the issue that I need to pass the job_flow_id to one of my emr-steps. I am capable of retrieving the job_flow_id from the operator but when I am going to create the steps to submit to the cluster, the task_instance value is not right. I have the following code: def issue_step(name, args): return [ { "Name": name, "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "s3://....", "Args": args } } ] dag = DAG('example', description='My dag', schedule_interval='0 8 * *

Status of Airflow task within the dag

只谈情不闲聊 提交于 2019-12-09 09:49:48
问题 I need the status of the task like if it is running or upforretry or failed within the same dag. So i tried to get it using the below code, though i got no output... Auto = PythonOperator( task_id='test_sleep', python_callable=execute_on_emr, op_kwargs={'cmd':'python /home/hadoop/test/testsleep.py'}, dag=dag) logger.info(Auto) The intention is to kill certain running tasks once a particular task on airflow completes. Question is how do i get the state of a task like is it in the running state

Want to create airflow tasks that are downstream of the current task

馋奶兔 提交于 2019-12-08 17:31:31
I'm mostly brand new to airflow. I have a two step process: Get all files that match a criteria Uncompress the files The files are half a gig compressed, and 2 - 3 gig when uncompressed. I can easily have 20+ files to process at a time, which means uncompressing all of them can run longer than just about any reasonable timeout I could use XCom to get the results of step 1, but what I'd like to do is something like this: def processFiles (reqDir, gvcfDir, matchSuffix): theFiles = getFiles (reqDir, gvcfDir, matchSuffix) for filePair in theFiles: task = PythonOperator (task_id = "Uncompress_" +

Which version of MySQL is compatible with Airflow version 1.10?

≯℡__Kan透↙ 提交于 2019-12-08 09:39:10
问题 I am trying to use LocalExecutor instead of the default SequentialExecutor which forces to use a different database then SQLlite. I wanted to try MySQL, however I am seeing issues with MySQL version 5.6, 5.7? Not sure if it is related to version compatibility. Would love to see any documentation related to Airflow versions and compatible MySQL versions. Update: Here is the Ooops error I am seeing in th UI when click on any of the DAG related buttons while using MySQL backend: Traceback (most

Add GCP credentials to airflow via command line

China☆狼群 提交于 2019-12-08 05:50:39
问题 Airflow allows us to add connection information via command-line airflow connections. This can help with automated deployment of airflow installations via ansible or other dev-ops tools. It is unclear how connections to google cloud platform (service accounts) can be added to ariflow via command line. 回答1: Pre airflow 1.9 the following example outlines how to use a DAG to add connection information: https://gist.github.com/yu-iskw/42f9f0aa6f2ff0a2a375d43881e13b49 def add_gcp_connection(ds, *

Airflow: How to get the return output of one task to set the dependencies of the downstream tasks to run?

老子叫甜甜 提交于 2019-12-08 05:11:30
问题 We have a kubernetes pod operator that will spit out a python dictionary that will define which further downstream kubernetes pod operators to run along with their dependencies and the environment variables to pass into each operator. How do I get this python dictionary object back into the executor's context (or is it worker's context?) so that airflow can spawn the downstream kubernetes operators? I've looked at BranchOperator and TriggerDagRunOperator and XCOM push/pull and Variable.get

Can I have tasks under one DAG with different start dates in Airflow?

不羁岁月 提交于 2019-12-08 05:09:52
问题 I have a DAG which runs two tasks: A and B . Instead of specifying the start_date on DAG level, I have added it as an attribute to the operators (I am using a PythonOperator in this case) and removed it form the DAG dictionary. Both tasks run daily. The start_date for A is 2013-01-01 and the start_date for B is 2015-01-01. My problem is that Airflow runs for 16 days for tasks A (because I guess in my airflow.cfg I have left the default dag_concurrency = 16 )from 2013-01-01 and after that it

Run Stored Procedure in Airflow

蹲街弑〆低调 提交于 2019-12-08 05:04:32
问题 I try to run my stored procedure in Airflow. Simply, I imported mssql operator and tried to execute following: sql_command = """ EXEC [spAirflowTest] """ t3 = MsSqlOperator( task_id = 'run_test_proc', mssql_conn_id = 'FIConnection', sql = sql_command, dag = dag, database = 'RDW') It completes this task as successful. However, task is not even executed. Because I get no error from system, I also cannot identify the error. To identify whether it arrived to my microsoft sql server, I checked

Airlfow serving static html directory

烈酒焚心 提交于 2019-12-08 04:41:28
I have static html documentation built using sphinx in: $AIRFLOW_HOME/plugins/docs/ I want to create a new menu link "My Documentation" in the Airflow UI so this works: class DocsView(BaseView): @expose("/") def my_docs(self): return send_from_directory(os.path.abspath("plugins/docs/build/html"), 'index.html') docs_view = DocsView( category="My Documentation", name="Plugins", endpoint="my_docs" ) And in my custom plugin class: class MyPlugin(AirflowPlugin): admin_views = [docs_view] The link is successfully showing in the menu bar and works but only for index.html. I don't use templates and