airflow

Need to access schedule time in DockerOperator in Airflow

两盒软妹~` 提交于 2020-01-06 05:08:39
问题 Need to access schedule time in airflow's docker operator. For example t1 = DockerOperator( task_id="task", dag=dag, image="test-runner:1.0", docker_url="xxx.xxx.xxx.xxx:2376", environment={"FROM": "{{(execution_date + macros.timedelta(hours=6,minutes=(30))).isoformat()}}"}) Basically, I need to populate schedule time as docker environment. 回答1: First macros only works if it is a template_fields. Second, you need to check which version of airflow you are using, if you are using 1.9 or below,

Dynamically change the number of task retries

人盡茶涼 提交于 2020-01-06 03:19:05
问题 Retrying a task may be pointless. For example, if the task is a sensor and it failed because it had invalid credentials then any future retries would inevitably fail. How can I define Operators that can decide if a retry is sensible? In Airflow 1.10.6, the logic that decides if a task should be retried or not is in airflow.models.taskinstance.TaskInstance.handle_failure , making it impossible to define the behavior in the operator as it is a responsibility of the task and not the operator. An

Airflow unable to fetch the success status from dataflow

China☆狼群 提交于 2020-01-05 13:54:02
问题 When submitting a dataflow job from airflow , its unable to fetch the success status of the dataflow job and keeps displaying the below error. {gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet.. airflow Dag t2 = DataFlowPythonOperator( task_id='google_dataflow', py_file='/Users/abc/sample.py', gcp_conn_id='connection_id', dataflow_default_options={ "project": 'Project_id' "runner": "DataflowRunner", "staging_location": 'gs://Project_id/staging', "temp_location": 'gs

Airflow unable to fetch the success status from dataflow

徘徊边缘 提交于 2020-01-05 13:53:12
问题 When submitting a dataflow job from airflow , its unable to fetch the success status of the dataflow job and keeps displaying the below error. {gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet.. airflow Dag t2 = DataFlowPythonOperator( task_id='google_dataflow', py_file='/Users/abc/sample.py', gcp_conn_id='connection_id', dataflow_default_options={ "project": 'Project_id' "runner": "DataflowRunner", "staging_location": 'gs://Project_id/staging', "temp_location": 'gs

Airflow unable to fetch the success status from dataflow

眉间皱痕 提交于 2020-01-05 13:53:11
问题 When submitting a dataflow job from airflow , its unable to fetch the success status of the dataflow job and keeps displaying the below error. {gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet.. airflow Dag t2 = DataFlowPythonOperator( task_id='google_dataflow', py_file='/Users/abc/sample.py', gcp_conn_id='connection_id', dataflow_default_options={ "project": 'Project_id' "runner": "DataflowRunner", "staging_location": 'gs://Project_id/staging', "temp_location": 'gs

Run Task on Success or Fail but not on Skipped

好久不见. 提交于 2020-01-05 04:15:08
问题 Is there a way to run a task if the upstream task succeeded or failed but not if the upstream was skipped? I am familiar with trigger_rule with the all_done parameter, as mentioned in this other question, but that triggers the task when the upstream has been skipped. I only want the task to fire on the success or failure of the upstream task. 回答1: I don't believe there is a trigger rule for success and failed. What you could do is set up duplicate tasks, one with the trigger rule all_success

What is the best way to store login credentials on Airflow?

可紊 提交于 2020-01-04 15:31:48
问题 I found out there are lot of ways to store it as variables, hooks and other ways using encryption. I would like to know what's the best way to do it. 回答1: Currently there 2 ways of storing secrests: 1) Airflow Variables : Value of a variable will be hidden if the key contains any words in (‘password’, ‘secret’, ‘passwd’, ‘authorization’, ‘api_key’, ‘apikey’, ‘access_token’) by default, but can be configured to show in clear-text as shown in the image below. However, there is a known-bug where

What is the best way to store login credentials on Airflow?

心已入冬 提交于 2020-01-04 15:31:23
问题 I found out there are lot of ways to store it as variables, hooks and other ways using encryption. I would like to know what's the best way to do it. 回答1: Currently there 2 ways of storing secrests: 1) Airflow Variables : Value of a variable will be hidden if the key contains any words in (‘password’, ‘secret’, ‘passwd’, ‘authorization’, ‘api_key’, ‘apikey’, ‘access_token’) by default, but can be configured to show in clear-text as shown in the image below. However, there is a known-bug where

Airflow DAG dynamic structure

寵の児 提交于 2020-01-04 14:24:09
问题 I was looking for a solution where I can decide the dag structure when the dag is triggered as I'm not sure about the number of operators that I'll have to run. Please refer below for the execution sequence that I'm planning to create. |-- Task B.1 --| |-- Task C.1 --| |-- Task B.2 --| |-- Task C.2 --| Task A --|-- Task B.3 --|---> Task B ---> |-- Task C.3 --| | .... | | .... | |-- Task B.N --| |-- Task C.N --| I'm not sure about the value of N. Is this possible in airflow. If so, how do I

Is there a way to create dynamic workflows in Airflow

三世轮回 提交于 2020-01-04 13:39:56
问题 So I have task A which is copying some unkown number of files into a folder. Task B runs on each of those files in the folder. I have no way of knowing the number of files beforehand as they keep changing. Is there a way to make this work in airflow. spans = os.listdir('/home/abc/tmpFolder') counter = 0 for s in spans: src_path = '/home/abc/tmpFolder' + s dst_path = "tmp/" + s counter += 1 run_this = \ FileToGoogleCloudStorageOperator( task_id='gcp_task_' + str(counter), src=src_path, dst=dst