airflow-operator

How we can use SFTPToGCSOperator in GCP composer enviornment(1.10.6)?

痞子三分冷 提交于 2020-06-23 08:46:10
问题 Here I want to use SFTPToGCSOperator in composer enviornment(1.10.6) of GCP. I know there is a limitation because The operator present only in latest version of airflow not in composer latest version 1.10.6. See the refrence - https://airflow.readthedocs.io/en/latest/howto/operator/gcp/sftp_to_gcs.html I found the alternative of operator and I created a plugin class, But again I faced the issue for sftphook class, Now I am using older version of sftphook class. see the below refrence - from

How to mount volume of airflow worker to airflow kubernetes pod operator?

十年热恋 提交于 2020-05-13 07:27:27
问题 I am trying to using the kubernetes pod operator in airflow, and there is a directory that I wish to share with kubernetes pod on my airflow worker, is there is a way to mount airflow worker's directory to kubernetes pod? I tried with the code below, and the volumn seems not mounted successfully. import datetime import unittest from unittest import TestCase from airflow.operators.kubernetes_pod_operator import KubernetesPodOperator from airflow.kubernetes.volume import Volume from airflow

Airflow DAG is running for all the retries

孤街醉人 提交于 2020-04-17 22:06:18
问题 I have a DAG running since few months and from last one week it's behaving abnormal. i am running a bash operator which is executing a shell script and in shell script we have a hive query. no of retries set to 4 as below. default_args = { 'owner': 'airflow', 'depends_on_past': False, 'email': ['airflow@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 4 , 'retry_delay': timedelta(minutes=5) } i can see in the log that it's triggering the hive query and loosing the

Biqquery: Some rows belong to different partitions rather than destination partition

牧云@^-^@ 提交于 2020-01-25 09:20:28
问题 I am running a Airflow DAG which moves data from GCS to BQ using operator GoogleCloudStorageToBigQueryOperator i am on Airflow version 1.10.2. This task moves data from MySql to BQ(Table partitioned), all this time we were partitioned by Ingestion-time and the incremental load for past three days were working fine when the data was loaded using Airflow DAG. Now we changed the partitioned type to be Date or timestamp on a DATE column from the table, after which we have started getting this

Airflow on Docker - Path issue

≡放荡痞女 提交于 2020-01-06 08:10:45
问题 Working with airflow I try simple DAG work. I wrote custom operators and other files that I want to import into the main file where the DAG logic is. Here the folder's structure : ├── airflow.cfg ├── dags │ ├── __init__.py │ ├── dag.py │ └── sql_statements.sql ├── docker-compose.yaml ├── environment.yml └── plugins ├── __init__.py └── operators ├── __init__.py ├── facts_calculator.py ├── has_rows.py └── s3_to_redshift.py I setup the volume right in the compose file since I can see them when I

catchup = False, why still two schedule runs are scheduled?

本小妞迷上赌 提交于 2019-12-24 19:23:06
问题 I've simple DAG: (Airflow v1.10.16, using SequentialExecutor on localhost machine) start_date set in past catchup = False default_args = {'owner': 'test_user', 'start_date': datetime(2019, 12, 1, 1, 00, 00),} graph1 = DAG(dag_id = 'test_dag', default_args=default_args, schedule_interval=timedelta(days=1), catchup = False) t = PythonOperator(task_id='t', python_callable=my_func, dag=graph1) as per code comments :param catchup: Perform scheduler catchup (or only run latest)? I expected when the

Programmatically clear the state of airflow task instances

懵懂的女人 提交于 2019-12-24 00:59:32
问题 I want to clear the tasks in DAG B when DAG A completes execution . Both A and B are scheduled DAGs. Is there any operator /way to clear the state of tasks and re-run DAG B programmatically? I'm aware of the CLI option and Web UI option to clear the tasks. 回答1: cli.py is an incredibly useful place to peep into SQLAlchemy magic of Airflow . The clear command is implemented here @cli_utils.action_logging def clear(args): logging.basicConfig( level=settings.LOGGING_LEVEL, format=settings.SIMPLE