airflow

Airflow - Skip future task instance without making changes to dag file

笑着哭i 提交于 2020-06-26 06:16:30
问题 I have a DAG 'abc' scheduled to run every day at 7 AM CST and there is task 'xyz' in that DAG. For some reason, I do not want to run one of the tasks 'xyz' for tomorrow's instance. How can I skip that particular task instance? I do not want to make any changes to code as I do not have access to Prod code and the task is in Prod environment now. Is there any way to do that using command line ? Appreciate any help on this. 回答1: You can mark the unwanted tasks as succeeded using the run command.

How to install airflow?

社会主义新天地 提交于 2020-06-24 08:09:08
问题 I seem to be doing sth. wrong. https://pythonhosted.org/airflow/start.html $ export AIRFLOW_HOME=~/airflow $ pip install apache-airflow Requirement already satisfied $ airflow initdb airflow: Command not found python --version Python 2.7.10 It's weird - the installation seemed to have worked fine (with some warnings - nothing serious) saying: airflow, flask, etc. successfully installed. But even after restarting the PC (Ubuntu 15.10) airflow seems not to be a command. 回答1: You can create a

How we can use SFTPToGCSOperator in GCP composer enviornment(1.10.6)?

痞子三分冷 提交于 2020-06-23 08:46:10
问题 Here I want to use SFTPToGCSOperator in composer enviornment(1.10.6) of GCP. I know there is a limitation because The operator present only in latest version of airflow not in composer latest version 1.10.6. See the refrence - https://airflow.readthedocs.io/en/latest/howto/operator/gcp/sftp_to_gcs.html I found the alternative of operator and I created a plugin class, But again I faced the issue for sftphook class, Now I am using older version of sftphook class. see the below refrence - from

Airflow - xcom_pull in the bigquery operator

断了今生、忘了曾经 提交于 2020-06-17 09:41:49
问题 I try to use xcomm_pull to insert a data_key_param calculated by the python_operator and pass it to the bigquery_operator . The python operator return the output as string e.g. "2020-05-31". I got an error when running the BigqueryOperator: "Dependencies Blocking Task From Getting Scheduled" - Could not cast literal "{xcom_pull(task_ids[\'set_date_key_param\'])[0] }" The sql attribute value returned from the Airflow GUI after task execution: SELECT DATE_KEY, count(*) as COUNT FROM my-project

DAG is not visible on Airflow UI

自闭症网瘾萝莉.ら 提交于 2020-06-17 09:34:28
问题 This is my dag file in dags folder. Code that goes along with the Airflow located at: http://airflow.readthedocs.org/en/latest/tutorial.html """ from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta from work_file import Test class Main(Test): def __init__(self): super(Test, self).__init__() def create_dag(self): default_args = { "owner": "airflow", "depends_on

Airflow Scheduler is continuously issuing warning when using postgresSQL 12 as backend database

别说谁变了你拦得住时间么 提交于 2020-06-17 09:11:26
问题 While executing airflow scheduler is continue printing following messages and tasks are NOT getting picked up. [2020-02-21 09:21:20,696] {dag_processing.py:663} WARNING - DagFileProcessorManager (PID=11895) exited with exit code -11 - re-launching [2020-02-21 09:21:20,699] {dag_processing.py:556} INFO - Launched DagFileProcessorManager with pid: 11898 [2020-02-21 09:21:20,711] {settings.py:54} INFO - Configured default timezone <Timezone [UTC]> [2020-02-21 09:21:20,725] {settings.py:253} INFO

Google Composer- How do I install Microsoft SQL Server ODBC drivers on environments

时光毁灭记忆、已成空白 提交于 2020-06-17 03:37:54
问题 I am new to GCP and Airflow and am trying to run my python pipelines via a simple PYODBC connection via python 3. However, I believe I have found what I need to install on the machines [Microsoft doc]https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-2017 , but I am not sure where to go in GCP to run these commands. I have gone down several deep holes looking for answers, but don't know how to solve the problem Here

airflow: how can i put the method for read a json file in a local library

本小妞迷上赌 提交于 2020-06-15 06:45:29
问题 I must generate some dag. I've saved the json table schema files on GCP bucket. The files on the GCP bucket associates to composer will be remapped on /home/airflow/gcs/dags/ . If i define the method for read the json file, after the creation of the dag, all goes fine. But if I wish generate some "common code" (for put it on a library of mine), I can't access to FileSystem using the code in the library, in the specific I can't use the python json library. The strange thing is that, I define

Airflow/Luigi for AWS EMR automatic cluster creation and pyspark deployment

℡╲_俬逩灬. 提交于 2020-06-13 05:36:48
问题 I am new to airflow automation, i dont now if it is possible to do this with apache airflow(or luigi etc) or should i just make a long bash file to do this. I want to build dag for this Create/clone a cluster on AWS EMR Install python requirements Install pyspark related libararies Get latest code from github Submit spark job Terminate cluster on finish for individual steps, i can make .sh files like below(not sure if it is good to do this or not) but dont know how to do it in airflow 1)

Airflow/Luigi for AWS EMR automatic cluster creation and pyspark deployment

主宰稳场 提交于 2020-06-13 05:36:30
问题 I am new to airflow automation, i dont now if it is possible to do this with apache airflow(or luigi etc) or should i just make a long bash file to do this. I want to build dag for this Create/clone a cluster on AWS EMR Install python requirements Install pyspark related libararies Get latest code from github Submit spark job Terminate cluster on finish for individual steps, i can make .sh files like below(not sure if it is good to do this or not) but dont know how to do it in airflow 1)