airflow

How to schedule python script in google cloud without using cron jobs?

时光总嘲笑我的痴心妄想 提交于 2019-12-01 07:03:45
I have two python scripts running once a day in my local environment. One is to fetch data and another is to format it. Now I want to deploy those scripts to Google's cloud environment and run those once/twice a day. Can I do that using Google Cloud Function or do I need App Engine? Why NO cron job: Because I don't want my system/VM to run whole day (when not in use). Can I use Cloud Composer to achieve that? You can use Google Cloud Scheduler which is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud

Airflow: How to push xcom value from PostgreOperator?

六眼飞鱼酱① 提交于 2019-12-01 06:35:47
I'm using Airflow 1.8.1 and I want to push the result of a sql request from PostgreOperator. Here's my tasks: check_task = PostgresOperator( task_id='check_task', postgres_conn_id='conx', sql="check_task.sql", xcom_push=True, dag=dag) def py_is_first_execution(**kwargs): value = kwargs['ti'].xcom_pull(task_ids='check_task') print 'count ----> ', value if value == 0: return 'next_task' else: return 'end-flow' check_branch = BranchPythonOperator( task_id='is-first-execution', python_callable=py_is_first_execution, provide_context=True, dag=dag) and here is my sql script: select count(1) from

Airflow: How to push xcom value from PostgreOperator?

不羁岁月 提交于 2019-12-01 05:32:34
问题 I'm using Airflow 1.8.1 and I want to push the result of a sql request from PostgreOperator. Here's my tasks: check_task = PostgresOperator( task_id='check_task', postgres_conn_id='conx', sql="check_task.sql", xcom_push=True, dag=dag) def py_is_first_execution(**kwargs): value = kwargs['ti'].xcom_pull(task_ids='check_task') print 'count ----> ', value if value == 0: return 'next_task' else: return 'end-flow' check_branch = BranchPythonOperator( task_id='is-first-execution', python_callable=py

Airflow Python Script with execution_date in op_kwargs

淺唱寂寞╮ 提交于 2019-12-01 05:22:25
问题 With assistance from this answer https://stackoverflow.com/a/41730510/4200352 I am executing a python file. I use PythonOperator and am trying to include the execution date as an argument passed to the script. I believe I can access it somehow through kwargs['execution_date']. The below fails DAG.py from airflow import DAG from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta import sys import os sys.path.append(os.path.abspath("/home/glsam

How do I setup Airflow's email configuration to send an email on errors?

你说的曾经没有我的故事 提交于 2019-12-01 05:10:56
I'm trying to make an Airflow task intentionally fail and error out by passing in a Bash line ( thisshouldnotrun ) that doesn't work. Airflow is outputting the following: [2017-06-15 17:44:17,869] {bash_operator.py:94} INFO - /tmp/airflowtmpLFTMX7/run_bashm2MEsS: line 7: thisshouldnotrun: command not found [2017-06-15 17:44:17,869] {bash_operator.py:97} INFO - Command exited with return code 127 [2017-06-15 17:44:17,869] {models.py:1417} ERROR - Bash command failed Traceback (most recent call last): File "/home/ubuntu/.local/lib/python2.7/site-packages/airflow/models.py", line 1374, in run

Apache Airflow 1.10.3: Executor reports task instance ??? finished (failed) although the task says its queued. Was the task killed externally?

瘦欲@ 提交于 2019-12-01 04:53:11
问题 An Airflow ETL dag has the error every day Our airflow installation is using CeleryExecutor. The concurrency configs were # The amount of parallelism as a setting to the executor. This defines # the max number of task instances that should run simultaneously # on this airflow installation parallelism = 32 # The number of task instances allowed to run concurrently by the scheduler dag_concurrency = 16 # Are DAGs paused by default at creation dags_are_paused_at_creation = True # When not using

Apache Airflow : airflow initdb results in “ImportError: No module named json”

杀马特。学长 韩版系。学妹 提交于 2019-12-01 03:55:58
On Ubuntu 16.04 with Python 2.7 default version, I am trying to install Apache airflow but ran into several issues and currently I see on apache initdb Traceback (most recent call last): File "/usr/local/bin/airflow", line 21, in <module> from airflow import configuration File "/usr/local/lib/python2.7/dist-packages/airflow/__init__.py", line 40, in <module> from flask_admin import BaseView File "/usr/local/lib/python2.7/dist-packages/flask_admin/__init__.py", line 6, in <module> from .base import expose, expose_plugview, Admin, BaseView, AdminIndexView # noqa: F401 File "/usr/local/lib

Airflow - run task regardless of upstream success/fail

徘徊边缘 提交于 2019-12-01 02:44:53
I have a DAG which fans out to multiple independent units in parallel. This runs in AWS, so we have tasks which scale our AutoScalingGroup up to the maximum number of workers when the DAG starts, and to the minimum when the DAG completes. The simplified version looks like this: | - - taskA - - | | | scaleOut - | - - taskB - - | - scaleIn | | | - - taskC - - | However, some of the tasks in the parallel set fail occasionally, and I can't get the scaleDown task to run when any of the A-C tasks fail. What's the best way to have a task execute at the end of the DAG, once all other tasks have

Apache Airflow : airflow initdb results in “ImportError: No module named json”

六月ゝ 毕业季﹏ 提交于 2019-12-01 00:27:17
问题 On Ubuntu 16.04 with Python 2.7 default version, I am trying to install Apache airflow but ran into several issues and currently I see on apache initdb Traceback (most recent call last): File "/usr/local/bin/airflow", line 21, in <module> from airflow import configuration File "/usr/local/lib/python2.7/dist-packages/airflow/__init__.py", line 40, in <module> from flask_admin import BaseView File "/usr/local/lib/python2.7/dist-packages/flask_admin/__init__.py", line 6, in <module> from .base

Jobs not executing via Airflow that runs celery with RabbitMQ

筅森魡賤 提交于 2019-11-30 23:26:21
Below is the config im using [core] # The home folder for airflow, default is ~/airflow airflow_home = /root/airflow # The folder where your airflow pipelines live, most likely a # subfolder in a code repository dags_folder = /root/airflow/dags # The folder where airflow should store its log files. This location base_log_folder = /root/airflow/logs # An S3 location can be provided for log backups # For S3, use the full URL to the base folder (starting with "s3://...") s3_log_folder = None # The executor class that airflow should use. Choices include # SequentialExecutor, LocalExecutor,