airflow

(Django) ORM in airflow - is it possible?

时光怂恿深爱的人放手 提交于 2020-05-10 07:42:23
问题 How to work with Django models inside Airflow tasks? According to official Airflow documentation, Airflow provides hooks for interaction with databases (like MySqlHook / PostgresHook / etc) that can be later used in Operators for row query execution. Attaching the core code fragments: Copy from https://airflow.apache.org/_modules/mysql_hook.html class MySqlHook(DbApiHook): conn_name_attr = 'mysql_conn_id' default_conn_name = 'mysql_default' supports_autocommit = True def get_conn(self): """

(Django) ORM in airflow - is it possible?

青春壹個敷衍的年華 提交于 2020-05-10 07:42:08
问题 How to work with Django models inside Airflow tasks? According to official Airflow documentation, Airflow provides hooks for interaction with databases (like MySqlHook / PostgresHook / etc) that can be later used in Operators for row query execution. Attaching the core code fragments: Copy from https://airflow.apache.org/_modules/mysql_hook.html class MySqlHook(DbApiHook): conn_name_attr = 'mysql_conn_id' default_conn_name = 'mysql_default' supports_autocommit = True def get_conn(self): """

Airflow - Python file NOT in the same DAG folder

二次信任 提交于 2020-05-09 21:05:59
问题 I am trying to use Airflow to execute a simple task python. from __future__ import print_function from airflow.operators.python_operator import PythonOperator from airflow.models import DAG from datetime import datetime, timedelta from pprint import pprint seven_days_ago = datetime.combine(datetime.today() - timedelta(7), datetime.min.time()) args = { 'owner': 'airflow', 'start_date': seven_days_ago, } dag = DAG(dag_id='python_test', default_args=args) def print_context(ds, **kwargs): pprint

Airflow DAG is running for all the retries

孤街醉人 提交于 2020-04-17 22:06:18
问题 I have a DAG running since few months and from last one week it's behaving abnormal. i am running a bash operator which is executing a shell script and in shell script we have a hive query. no of retries set to 4 as below. default_args = { 'owner': 'airflow', 'depends_on_past': False, 'email': ['airflow@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 4 , 'retry_delay': timedelta(minutes=5) } i can see in the log that it's triggering the hive query and loosing the

Airflow doesn't recognise my S3 Connection setting

こ雲淡風輕ζ 提交于 2020-04-17 14:18:26
问题 I am using Airflow with Kubernetes executor and testing out locally (using minikube), While I was able to get it up and running, I cant seem to store my logs in S3. I have tried all solutions that are described and I am still getting the following error, *** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log *** Fetching from: http://examplepythonoperatorprintthecontext-5b01d602e9d2482193d933e7d2:8793/log/example

What am I doing wrong in this DAG setup for KubernetesPodOperator

◇◆丶佛笑我妖孽 提交于 2020-04-13 02:53:58
问题 I found the following Airflow DAG in this Blog Post: from airflow import DAG from datetime import datetime, timedelta from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator from airflow.operators.dummy_operator import DummyOperator default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime.utcnow(), 'email': ['airflow@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5) }

What am I doing wrong in this DAG setup for KubernetesPodOperator

生来就可爱ヽ(ⅴ<●) 提交于 2020-04-13 02:52:29
问题 I found the following Airflow DAG in this Blog Post: from airflow import DAG from datetime import datetime, timedelta from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator from airflow.operators.dummy_operator import DummyOperator default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime.utcnow(), 'email': ['airflow@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5) }

Accessing the 'ds' variable in airflow

旧时模样 提交于 2020-04-11 09:59:22
问题 I am able to access the macros in python code like below: partition_dt = macros.ds_add(ds, 1) But i am not able to figure out how to get hold of the ds variable itself which seemingly can only be accessed in templates. Any pointers? 回答1: I assume you want to call one of the default variables built-in AirFlow ds - the execution date as YYYY-MM-DD To call just ds, you can do: EXEC_DATE = '{{ ds }}' To call what you wanted - macros.ds_add: EXEC_DATE = '{{ macros.ds_add(ds, 1) }}' And load it

Accessing the 'ds' variable in airflow

妖精的绣舞 提交于 2020-04-11 09:58:53
问题 I am able to access the macros in python code like below: partition_dt = macros.ds_add(ds, 1) But i am not able to figure out how to get hold of the ds variable itself which seemingly can only be accessed in templates. Any pointers? 回答1: I assume you want to call one of the default variables built-in AirFlow ds - the execution date as YYYY-MM-DD To call just ds, you can do: EXEC_DATE = '{{ ds }}' To call what you wanted - macros.ds_add: EXEC_DATE = '{{ macros.ds_add(ds, 1) }}' And load it

How to set request_cpu globally for airflow worker pods using the kubernetes executor?

我们两清 提交于 2020-04-07 07:35:32
问题 I'm trying to set the request_cpu parameter in the Kubernetes executor for Airflow but haven't been able to find where I can do that. In the default airflow config I found default_cpus but according to this answer there is nowhere that that is used, and nowhere else in the Kubernetes section could I find a reference to the CPU request. How can I set the request_cpu parameter in the Airflow Kubernetes executor? EDIT: Ideally, what I would like to be able to do is set this as a global default