airflow

Use separate environ and sys.path between dags

為{幸葍}努か 提交于 2019-12-02 18:30:44
问题 * TLDR : This question originally based on problem that was later determined to be due to the updated title of this question. Skip to "Update 2" for most relevant question details. Have dag file that imports a python list of dicts from another python file in another location and creates a dag based on the list's dict values and airflow is having weird problem where it appear to see something different that when I run the dag file manually. Some snippet like... ... environ["PROJECT_HOME"] = "

ETL model with DAGs and Tasks

帅比萌擦擦* 提交于 2019-12-02 18:15:19
问题 I'm trying to model my ETL jobs with Airflow. All jobs have kind of the same structure: Extract from a transactional database(N extractions, each one reading 1/N of the table) Then transform data Finally, insert the data into an analytic database So E >> T >> L This Company Routine USER >> PRODUCT >> ORDER has to run every 2 hours. Then I will have all the data from users and purchases. How can I model it? The Company Routine (USER >> PRODUCT >> ORDER ) must be a DAG and each job must be a

How do I restart airflow webserver?

六月ゝ 毕业季﹏ 提交于 2019-12-02 18:00:01
I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command aiflow webserver -p 8080 -D True Server running successfully in backend. Now I want to enable authentication in airflow and done configuration changes in airflow.cfg, but authentication functionality is not reflected in server. when I stop and start airflow server in my local machine it works. So How can I restart my daemon airflow webserver process in my server?? I advice running airflow in a robust way, with auto-recovery with

Airflow mysql to gcp Dag error

老子叫甜甜 提交于 2019-12-02 15:54:18
问题 I'm recently started working with Airflow. I'm working on DAG that: Queries the MySQL database Extract the query and stores it in a cloud storage bucket as a JSON file Uploads stored JSON file to BigQuery Dag imports three operators: MySqlOperator , MySqlToGoogleCloudStorageOperator and GoogleCloudStorageToBigQueryOperator I am using Airflow 1.8.0, Python 3, and Pandas 0.19.0. Here is my Dag Code: sql2gcp_csv = MySqlToGoogleCloudStorageOperator( task_id='sql2gcp_csv', sql='airflow_gcp/aws_sql

Google Cloud Composer BigQuery Operator- Get Jobs API HTTPError 404

懵懂的女人 提交于 2019-12-02 13:21:25
问题 I am trying to run a BigQueryOperator on GCC. I have already succeeded in running for BigQueryCreateEmptyTableOperator and BigQueryTableDeleteOperator. Here is my code for the dag: import datetime import os import logging from airflow import configuration from airflow import models from airflow import DAG from airflow.operators import email_operator from airflow.contrib.operators import bigquery_operator from airflow.contrib.operators import bigquery_check_operator from airflow.utils import

Airflow Task failure/retry workflow

守給你的承諾、 提交于 2019-12-02 12:48:27
问题 I have retry logic for tasks and it's not clear how Airflow handles task failures when retries are turned on. Their documentation just states that on_failure_callback gets triggered when a task fails, but if that task fails and is also marked for retry does that mean that both the on_failure_callback and on_retry_callback would be called? 回答1: Retry logic/parameters will take place before failure logic/parameters. So if you have a task set to retry twice, it will attempt to run again two

duplicate key value violates unique constraint when adding path variable in airflow dag

我的未来我决定 提交于 2019-12-02 10:31:41
To set up the connections and variables in airflow i use a DAG, we do this inorder to setup airflow fast in case we have to setup everything again fast. It does work my connections and variables show up but the task "fails". The error is saying that there is already an sql_path variable [2018-03-30 19:42:48,784] {{models.py:1595}} ERROR - (psycopg2.IntegrityError) duplicate key value violates unique constraint "variable_key_key" DETAIL: Key (key)=(sql_path) already exists. [SQL: 'INSERT INTO variable (key, val, is_encrypted) VALUES (%(key)s, %(val)s, %(is_encrypted)s) RETURNING variable.id']

Airflow apply_defaults decorator reports Argument is required

陌路散爱 提交于 2019-12-02 10:14:33
I recently ran into this nasty error where Airflow's apply_defaults decorator is throwing following stack-trace ( my **kwargs do contain job_flow_id ) File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/mysql_import_dag.py", line 23, in <module> sync_dag_builder.build_sync_dag() File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/builders/sync_dag_builders/emr_sync_dag_builder.py", line 26, in build_sync_dag create_emr_task, terminate_emr_task = self._create_job_flow_tasks() File "

Failed to extract xcom from airflow pod - Kubernetes Pod Operator

我只是一个虾纸丫 提交于 2019-12-02 09:54:24
问题 While running a DAG which runs a jar using a docker image, xcom_push=True is given which creates another container along with the docker image in a single pod. DAG : jar_task = KubernetesPodOperator( namespace='test', image="path to image", image_pull_secrets="secret", image_pull_policy="Always", node_selectors={"d-type":"na-node-group"}, cmds=["sh","-c",..~running jar here~..], secrets=[secret_file], env_vars=environment_vars, labels={"k8s-app": "airflow"}, name="airflow-pod", config_file

Can a failed Airflow DAG Task Retry with changed parameter

徘徊边缘 提交于 2019-12-02 09:40:42
With Airflow, is it possible to restart an upstream task if a downstream task fails? This seems to be against the "Acyclic" part of the term DAG. I would think this is a common problem though. Background I'm looking into using Airflow to manage a data processing workflow that has been managed manually. There is a task that will fail if a parameter x is set too high, but increasing the parameter value gives better quality results. We have not found a way to calculate a safe but maximally high parameter x. The process by hand has been to restart the job if failed with a lower parameter until it