airflow

Airflow authentication setups fails with “AttributeError: can't set attribute”

风格不统一 提交于 2019-12-19 15:27:01
问题 The Airflow version 1.8 password authentication setup as described in the docs fails at the step user.password = 'set_the_password' with error AttributeError: can't set attribute 回答1: It's better to simply use the new method of PasswordUser _set_password : # Instead of user.password = 'password' user._set_password = 'password' 回答2: This is due to an update of SqlAlchemy to a version >= 1.2 that introduced a backwards incompatible change. You can fix this by explicitly installing a SqlAlchemy

Airflow authentication setups fails with “AttributeError: can't set attribute”

荒凉一梦 提交于 2019-12-19 15:26:09
问题 The Airflow version 1.8 password authentication setup as described in the docs fails at the step user.password = 'set_the_password' with error AttributeError: can't set attribute 回答1: It's better to simply use the new method of PasswordUser _set_password : # Instead of user.password = 'password' user._set_password = 'password' 回答2: This is due to an update of SqlAlchemy to a version >= 1.2 that introduced a backwards incompatible change. You can fix this by explicitly installing a SqlAlchemy

airflow的安装

穿精又带淫゛_ 提交于 2019-12-19 09:56:15
1.环境准备 1.1 安装环境 1.2 创建用户 2.安装airflow 2.1 安装python 2.2 安装pip 2.3 安装数据库 2.4 安装airflow 2.4.1 安装主模块 2.4.2 安装数据库模块、密码模块 2.5 配置airflown 2.5.1 设置环境变量 2.5.2 修改配置文件 3. 启动airflow 3.1 初始化数据库 3.2 创建用户 3.3 启动airflow 4.执行任务 5.安装celery 5.1 安装celery模块 5.2 安装celery broker 5.2.1 使用RabbitMQ作为broker 5.2.2 使用Redis做为broker 5.3 修改airflow配置文件启用celery 5.4 测试celery 5.5 部署多个worker 6. 问题 官方文档文档: http://airflow.incubator.apache.org/project.html 1.环境准备 1.1 安装环境 centos 6.7 (docker) python 2.7.13 docker run --name airflow -h airflow -dti --net hadoopnet --ip=172.18.0.20 -p 10131:22 -v /dfs/centos/airflow/home:/home -v /dfs

How to schedule python script in google cloud without using cron jobs?

前提是你 提交于 2019-12-19 09:05:11
问题 I have two python scripts running once a day in my local environment. One is to fetch data and another is to format it. Now I want to deploy those scripts to Google's cloud environment and run those once/twice a day. Can I do that using Google Cloud Function or do I need App Engine? Why NO cron job: Because I don't want my system/VM to run whole day (when not in use). Can I use Cloud Composer to achieve that? 回答1: You can use Google Cloud Scheduler which is a fully managed enterprise-grade

Airflow depends_on_past for whole DAG

纵然是瞬间 提交于 2019-12-19 07:06:31
问题 Is there a way in airflow of using the depends_on_past for an entire DagRun, not just applied to a Task? I have a daily DAG, and the Friday DagRun errored on the 4th task however the Saturday and Sunday DagRuns still ran as scheduled. Using depends_on_past = True would have paused the DagRun on the same 4th task, however the first 3 tasks would still have run. I can see in the DagRun DB table there is a state column that contains failed for the Friday DagRun. What I want is a way configuring

Airflow - run task regardless of upstream success/fail

爷,独闯天下 提交于 2019-12-19 05:14:56
问题 I have a DAG which fans out to multiple independent units in parallel. This runs in AWS, so we have tasks which scale our AutoScalingGroup up to the maximum number of workers when the DAG starts, and to the minimum when the DAG completes. The simplified version looks like this: | - - taskA - - | | | scaleOut - | - - taskB - - | - scaleIn | | | - - taskC - - | However, some of the tasks in the parallel set fail occasionally, and I can't get the scaleDown task to run when any of the A-C tasks

How to nest an Airflow DAG dynamically?

筅森魡賤 提交于 2019-12-19 03:24:33
问题 I have a simple DAG of three operators. The first one is PythonOperator with our own functionality, the other two are standard operators from airflow.contrib ( FileToGoogleCloudStorageOperator and GoogleCloudStorageToBigQueryOperator to be precise). They work in sequence. Our custom task produces a number of files, typically between 2 and 5, depending on the parameters. All of these files have to be processed by subsequent tasks separately. That means I want several downstream branches, but

How to pass parameter to PythonOperator in Airflow

北慕城南 提交于 2019-12-19 03:08:30
问题 I just started using Airflow , can anyone enlighten me how to pass a parameter into PythonOperator like below: t5_send_notification = PythonOperator( task_id='t5_send_notification', provide_context=True, python_callable=SendEmail, op_kwargs=None, #op_kwargs=(key1='value1', key2='value2'), dag=dag, ) def SendEmail(**kwargs): msg = MIMEText("The pipeline for client1 is completed, please check.") msg['Subject'] = "xxxx" msg['From'] = "xxxx" ...... s = smtplib.SMTP('localhost') s.send_message(msg

Airflow: Why is there a start_date for operators?

感情迁移 提交于 2019-12-19 02:11:23
问题 I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice? Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018 0.00 am with my cron expression for schedule interval being 30 9 * * * (daily at 9.30 am, i.e expecting to run in next 1 hour). Will my DAG run today at 9.30 am or tomorrow (8th Feb at 9.30 am )? 回答1: Regarding start_date on task instance, personally

Airflow: Why is there a start_date for operators?

≡放荡痞女 提交于 2019-12-19 02:10:12
问题 I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice? Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018 0.00 am with my cron expression for schedule interval being 30 9 * * * (daily at 9.30 am, i.e expecting to run in next 1 hour). Will my DAG run today at 9.30 am or tomorrow (8th Feb at 9.30 am )? 回答1: Regarding start_date on task instance, personally