airflow

Convert python script to Airflow PythonOperator(s)

痞子三分冷 提交于 2019-12-24 07:14:41
问题 I have a working python script with runs from CronJob. I want to convert it to DAG with PythonOperator(s) as we now are converting to Airflow. Say that I have functions: a(),b(),c(),d() And their execution order is : a->b->c->d Lets say that the function codes are: def a(): print("Happy") def b(): print("Birthday") def c(): print("to") def d(): print("you!") ** This is just an example my code for all functions is more complex I have this DAG: args = { 'owner': 'airflow', 'start_date': airflow

Airflow 1.9 : Run a task when upstream is skipped by shortcircuit

不羁岁月 提交于 2019-12-24 05:38:25
问题 I have a task that I'll call final that has multiple upstream connections. When one of the upstreams gets skipped by ShortCircuitOperator this task gets skipped as well. I don't want final task to get skipped as it has to report on DAG success. To avoid it getting skipped I used trigger_rule='all_done' , but it still gets skipped. If I use BranchPythonOperator instead of ShortCircuitOperator final task doesn't get skipped. It would seem like branching workflow could be a solution, even though

Internal server error in Google Composer web UI [Error code 28]

三世轮回 提交于 2019-12-24 04:32:13
问题 We are using Google Composer for workflow orchestration, randomly we are getting An internal server error occurred while authorizing your request. Error code 28 message while opening the web UI. We don't know the cause for this issue. How to fix this? 回答1: This issue could be given for users who try to access Airflow UI from certain location. Notice that direct access to the Airflow UI is not supported in Australia, New Zealand, and India as explained here. The product team is working on the

Airflow - Experimental API returning 405s for some endpoints

给你一囗甜甜゛ 提交于 2019-12-24 01:57:15
问题 I'm trying to set up my application to use Airflow's Experimental API. I'm using apache-airflow==1.10.2 . Using the config straight out of the box (no authentication enabled), I'm able to create DAG runs using the POST /api/experimental/dags/<DAG_ID>/dag_runs endpoint. However, when I try to use GET /api/experimental/dags/<DAG_ID>/dag_runs I get 405s. I tried enabling authentication when I noticed that that GET endpoint is part of the www_rbac folder, but not part of the www file. To verify I

Is it possible to update/overwrite the Airflow [‘dag_run’].conf?

白昼怎懂夜的黑 提交于 2019-12-24 01:17:30
问题 We typically start Airflow DAGs with the trigger_dag CLI command. For example: airflow trigger_dag my_dag --conf '{"field1": 1, "field2": 2}' We access this conf in our operators using context[‘dag_run’].conf Sometimes when the DAG breaks at some task, we'd like to "update" the conf and restart the broken task (and downstream dependencies) with this new conf. For example: new conf --> {"field1": 3, "field2": 4} Is it possible to “update” the dag_run conf with a new json string like this?

Manual DAG run set individual task state

妖精的绣舞 提交于 2019-12-24 00:59:56
问题 I have a DAG without a schedule (it is run manually as needed). It has many tasks. Sometimes I want to 'skip' some initial tasks by changing the task state to SUCCESS manually. Changing task state of a manually executed DAG fails, seemingly because of a bug in parsing the execution_date. Is there another way to individually setting task states for a manually executed DAG? Example run below. The execution date of the Task is 01-13T17:27:13.130427, and I believe the milliseconds are not being

Programmatically clear the state of airflow task instances

懵懂的女人 提交于 2019-12-24 00:59:32
问题 I want to clear the tasks in DAG B when DAG A completes execution . Both A and B are scheduled DAGs. Is there any operator /way to clear the state of tasks and re-run DAG B programmatically? I'm aware of the CLI option and Web UI option to clear the tasks. 回答1: cli.py is an incredibly useful place to peep into SQLAlchemy magic of Airflow . The clear command is implemented here @cli_utils.action_logging def clear(args): logging.basicConfig( level=settings.LOGGING_LEVEL, format=settings.SIMPLE

How to get jobId that was submitted using Dataproc Workflow Template

爱⌒轻易说出口 提交于 2019-12-24 00:53:18
问题 I have submitted a Hive job using Dataproc Workflow Template with the help of Airflow operator (DataprocWorkflowTemplateInstantiateInlineOperator) written in Python. Once the job is submitted some name will be assigned as jobId (example: job0-abc2def65gh12 ). Since I was not able to get jobId I tried to pass jobId as a parameter from REST API which isn't working. Can I fetch jobId or, if it's not possible, can I pass jobId as a parameter? 回答1: The JobId will be available as part of metadata

Airflow - Get start time of dag run

烂漫一生 提交于 2019-12-24 00:46:07
问题 Is it possible to get the actual start time of a dag in Airflow? By start time I mean the exact time the first task of a dag starts running. I know I can use macros to get the execution date. If the job is ran using trigger_dag this is what I would call a start time but if the job is ran on a daily schedule then {{ execution_date }} returns yesterdays date. I have also tried to place datetime.now().isoformat() in the body of the dag code and then pass it to a task but this seems to return the

Airflow “none_failed” skipping when upstream skips

社会主义新天地 提交于 2019-12-24 00:36:57
问题 I have a workflow where I have two parallel processes ( sentinel_run and sentinel_skip ) which should run or be skipped based on a condition, and then join together ( resolve ). I need tasks directly downstream of either sentinel_ task to have cascaded skipping, but when it gets to the resolve task, resolve should run unless there are failures in either process upstream. Based on the documentation, the "none_failed" trigger rule should work: none_failed: all parents have not failed (failed or