airflow

Airflow: Log file isn't local, Unsupported remote log location

匿名 (未验证) 提交于 2019-12-03 01:12:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am not able see the logs attached to the tasks from the Airflow UI: Log related settings in airflow.cfg file are: remote_base_log_folder = base_log_folder = /home/my_projects/ksaprice_project/airflow/logs worker_log_server_port = 8793 child_process_log_directory = /home/my_projects/ksaprice_project/airflow/logs/scheduler Although I am setting remote_base_log_folter it is trying to fetch the log from http://:8793/log/tutorial/print_date/2017-08-02T00:00:00 - I don't understand this behavior. According to the settings the workers should

Make custom Airflow macros expand other macros

匿名 (未验证) 提交于 2019-12-03 01:10:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Is there any way to make a user-defined macro in Airflow which is itself computed from other macros? from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( 'simple', schedule_interval='0 21 * * *', user_defined_macros={ 'next_execution_date': '{{ dag.following_schedule(execution_date) }}', }, ) task = BashOperator( task_id='bash_op', bash_command='echo "{{ next_execution_date }}"', dag=dag, ) The use case here is to back-port the new Airflow v1.8 next_execution_date macro to work in Airflow v1.7.

Airflow + Cluster + Celery + SQS - Airflow Worker: 'Hub' object has no attribute '_current_http_client'

匿名 (未验证) 提交于 2019-12-03 01:05:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm trying to cluster my Airflow setup and I'm using this article to do so. I just configured my airflow.cfg file to use the CeleryExecutor , I pointed my sql_alchemy_conn to my postgresql database that's running on the same master node, I've set the broker_url to use AWS SQS (I didn't set the access_key_id or secret_key since it's running on an EC2-Instance it doesn't need those), and I've set the celery_result_backend to my postgresql server too. I saved my new airflow.cfg changes, I ran airflow initdb , and then I ran airflow scheduler

Is it possible for Airflow scheduler to first finish the previous day's cycle before starting the next?

こ雲淡風輕ζ 提交于 2019-12-03 01:02:50
Right now, nodes in my DAG proceeds to the next day's task before the rest of the nodes of that DAG finishes. Is there a way for it to wait for the rest of the DAG to finish before moving unto the next day's DAG cycle? (I do have depends_on_past as true, but that does not work in this case) My DAG looks like this: O l V O -> O -> O -> O -> O Also, tree view pic of the dag] Oleg Yamin Might be a bit late for this answer, but I ran into the same issue and the way I resolved it is I added two extra tasks in each dag. "Previous" at the start and "Complete" at the end. Previous task is external

Install airflow package extras in PyCharm

匿名 (未验证) 提交于 2019-12-03 00:56:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I want to use Airflow package extras s3 and postgres in PyCharm but do not know how to install them (on macOS Sierra). My attempts so far Airflow itself can be installed from Preferences > Project > Project interpreter > + but not the extras as far as I can work out. The extras can be installed with pip in the terminal using $ pip install airflow[s3,postgres] but they end up in a different interpreter ( ~/anaconda ) than the one used by PyCharm ( /usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7 ). Checking the

Airflow: Tasks queued but not running

匿名 (未验证) 提交于 2019-12-03 00:56:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I am new to airflow and trying to setup airflow to run ETL pipelines. I was able to install airflow postgres celery rabbitmq I am able to test run the turtorial dag. When i try to schedule the jobs, scheduler is able to pick it up and queue the jobs which i could see on the UI but tasks are not running. Could somebody help me fix ths issue? I believe i am missing most basic airflow concept here. below is the airflow.cfg Here is my config file: [ core ] airflow_home = /root/ airflow dags_folder = /root/ airflow / dags base_log

Airflow worker is not listening to default rabbitmq queue

匿名 (未验证) 提交于 2019-12-03 00:48:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have configured Airflow with rabbitmq broker, the services: airflow worker airflow scheduler airflow webserver are running without any errors. The scheduler is pushing the tasks to execute on default rabbitmq queue: Even I tried airflow worker -q=default - worker still not receiving tasks to run. My airflow.cfg settings file: [core] # The home folder for airflow, default is ~/airflow airflow_home = /home/my_projects/ksaprice_project/airflow # The folder where your airflow pipelines live, most likely a # subfolder in a code repository #

airflow任务运行抛出jinja2.exceptions.TemplateNotFound

匿名 (未验证) 提交于 2019-12-03 00:36:02
这是由于airflow使用了jinja2作为模板引擎导致的一个陷阱,当使用bash命令的时候,尾部必须加一个空格: here bash_command BashOperator t2 = BashOperator( task_id=‘sleep‘ , bash_command="/home/batcher/test.sh", // This fails with `Jinja template not found` error #bash_command="/home/batcher/test.sh ", // This works (has a space after) dag=dag) 参考链接: https://stackoverflow.com/questions/42147514/templatenotfound-error-when-running-simple-airflow-bashoperator https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls 原文:https://www.cnblogs.com/cord/p/9226628.html

airflow + CeleryExecutor 环境搭建

匿名 (未验证) 提交于 2019-12-03 00:36:02
mysql -> 后端数据库 redis -> 用于broker CeleryExecutor -> 执行器 安装python anaconda环境 添加py用户 # useradd py 设置密码 # passwd py 创建anaconda安装路径 # mkdir /anaconda 赋权 # chown -R py:py /anaconda 上传anaconda安装包并用py用户运行安装程序 $ chmod +x Anaconda3-5.1.0-Linux-x86_64.sh $ ./Anaconda3-5.1.0-Linux-x86_64.sh Welcome to Anaconda3 5.1.0 In order to continue the installation process, please review the license ...... - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/home/py/anaconda3] >>> /anaconda/anaconda3 输入自定义安装路径,如果用默认的话回车跳过 然后将anaconda加入环境变量,并使其生效 $

setting up airflow with bigquery operator

狂风中的少年 提交于 2019-12-03 00:30:56
I am experimenting with airflow for data pipelines. I unfortunately cannot get it to work with the bigquery operator so far. I have searched for a solution to the best of my ability but I am still stuck.. I am using the sequential executor running locally. Here is my code: from airflow import DAG from airflow.contrib.operators.bigquery_operator import BigQueryOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 6, 1), 'email': ['example@gmail.com'], 'email_on_failure': False, 'email_on_retry': False,