airflow

Pass a list of strings as parameter of a dependant task in Airflow

天涯浪子 提交于 2020-01-04 03:18:24
问题 I am trying to pass a list of strings from one task to another one via XCom but I do not seem to manage to get the pushed list interpreted back as a list. For example, when I do this in some function blah that is run in a ShortCircuitOperator : paths = ['gs://{}/{}'.format(bucket, obj) for obj in my_list] kwargs['ti'].xcom_push(key='return_value', value=full_paths) and then I want to use such list as a parameter of an operator. For example, run_task_after_blah = AfterBlahOperator( task_id=

Writing and importing custom plugins in Airflow

拟墨画扇 提交于 2020-01-03 06:27:34
问题 This is actually two questions combined into one. My AIRFLOW_HOME is structured like airflow +-- dags +-- plugins +-- __init__.py +-- hooks +-- __init__.py +-- my_hook.py +-- another_hook.py +-- operators +-- __init__.py +-- my_operator.py +-- another_operator.py +-- sensors +-- utils I've been following astronomer.io's examples here https://github.com/airflow-plugins. My custom operators use my custom hooks , and all the imports are relative to the top level folder plugins . # my_operator.py

Copy files from one Google Cloud Storage Bucket to other using Apache Airflow

狂风中的少年 提交于 2020-01-03 02:30:09
问题 Problem : I want to copy files from a folder in Google Cloud Storage Bucket (e.g Folder1 in Bucket1) to another Bucket (e.g Bucket2). I can't find any Airflow Operator for Google Cloud Storage to copy files. 回答1: I know this is an old question but I found myself dealing with this task too. Since I'm using the Google Cloud-Composer, GoogleCloudStorageToGoogleCloudStorageOperator was not available in the current version. I managed to solve this issue by using a simple BashOperator from airflow

Apache Airflow - customize logging format

a 夏天 提交于 2020-01-02 05:43:07
问题 Is it possible to customize the format that Airflow uses for logging? I tried adding a LOG_FORMAT variable in $AIRFLOW_HOME/airflow.cfg, but it doesn't seem to take effect LOG_FORMAT = "%(asctime)s logLevel=%(levelname)s logger=%(name)s - %(message)s" 回答1: You need to change the settings.py file in the airflow package to change the log format Update settings.py (after LOGGING_LEVEL add below line): LOG_FORMAT = os.path.expanduser(conf.get('core', 'LOG_FORMAT')) Update airflow.cfg

Generating dynamic tasks in airflow based on output of an upstream task

主宰稳场 提交于 2020-01-02 05:31:10
问题 How to generate tasks dynamically based on the list returned from an upstream task. I have tried the following: Using an external file to write and read from the list - this option works but I am looking for a more elegant solution. Xcom pull inside a subdag factory. It does not work. I am able to pass a list from the upstream task to a subdag but that xcom is only accessible inside of a subdag's task and cannot be used to loop/iterate over the returned list and generate tasks. for e.g.

Airflow basic auth - cannot create user

亡梦爱人 提交于 2020-01-02 05:20:13
问题 I'm running airflow v 1.9.0. I am trying to get some form of authentication working but have so far failed to get github auth and password auth working. The password auth feels like it should be pretty straight forward and I'm hoping someone can point me in the right direction. My airflow.cfg has the following [webserver] authenticate = True auth_backend = airflow.contrib.auth.backends.password_auth Following the instructions here https://airflow.incubator.apache.org/security.html#password I

Running an Airflow DAG every X minutes

与世无争的帅哥 提交于 2020-01-02 03:31:06
问题 I am using airflow on an EC2 instance using the LocalScheduler option. I've invoked airflow scheduler and airflow webserver and everything seems to be running fine. That said, after supplying the cron string to schedule_interval for "do this every 10 minutes," '*/10 * * * *' , the job continue to execute every 24 hours by default. Here's the header of the code: from datetime import datetime import os import sys from airflow.models import DAG from airflow.operators.python_operator import

How to work correctly airflow schedule_interval

*爱你&永不变心* 提交于 2020-01-02 01:10:14
问题 I want to try to use Airflow instead of Cron. But schedule_interval doesn't work as I expected. I wrote the python code like below. And in my understanding, Airflow should have ran on "2016/03/30 8:15:00" but it didn't work at that time. If I changed it like this "'schedule_interval': timedelta(minutes = 5)", it worked correctly, I think. The "notice_slack.sh" is just to call slack api to my channels. # -*- coding: utf-8 -*- from __future__ import absolute_import, unicode_literals import os

how do I use the --conf option in airflow

混江龙づ霸主 提交于 2020-01-01 09:26:06
问题 I am trying to run a airflow DAG and need to pass some parameters for the tasks. How do I read the JSON string passed as the --conf parameter in the command line trigger_dag command, in the python DAG file. ex: airflow trigger_dag 'dag_name' -r 'run_id' --conf '{"key":"value"}' 回答1: Two ways. From inside a template field or file: {{ dag_run.conf['key'] }} Or when context is available, e.g. within a python callable of the PythonOperator : context['dag_run'].conf['key'] 来源: https:/

Apache Airflow - trigger/schedule DAG rerun on completion (File Sensor)

不问归期 提交于 2019-12-31 22:41:12
问题 Good Morning. I'm trying to setup a DAG too Watch/sense for a file to hit a network folder Process the file Archive the file Using the tutorials online and stackoverflow I have been able to come up with the following DAG and Operator that successfully achieves the objectives, however I would like the DAG to be rescheduled or rerun on completion so it starts watching/sensing for another file. I attempted to set a variable max_active_runs:1 and then a schedule_interval: timedelta(seconds=5)