airflow

Facing issue while configuring MySql with apache airflow in Hadoop

柔情痞子 提交于 2020-04-07 05:33:23
问题 The bounty expires in 5 days . Answers to this question are eligible for a +100 reputation bounty. vikrant rana is looking for an answer from a reputable source . I was trying to install and configure apache airflow on dev Hadoop cluster of a three nodes with below configurations/version: Operating System: Red Hat Enterprise Linux Server 7.7 python 3.7.3 anaconda 2 spark 2.45 a)sudo yum install gcc gcc-c++ -y b)sudo yum install libffi-devel mariadb-devel cyrus-sasl-devel -y c)pip install

Airflow BashOperator: Passing parameter to external bash script

五迷三道 提交于 2020-03-25 18:55:30
问题 Having problems passing parameters to an external bash script from a BashOperator. When I run a local command, the params are substituted correctly: log_cleanup = """ echo "{{ params.BASE_LOG_FOLDER }}" """ log_cleanup_task = BashOperator( task_id='log_cleanup_task', provide_context=True, bash_command = log_cleanup, params = {'BASE_LOG_FOLDER': "/var/opt"}, dag=dagInstance, ) prints: "/var/opt" (without the double quotes) But if I call an external bash script, the params don't substitute in.

How to import external scripts in a Airflow DAG with Python?

我只是一个虾纸丫 提交于 2020-03-25 03:40:31
问题 I have the following structure: And I try to import the script inside some files of the inbound_layer like so: import calc However I get the following error message on Airflow web: Any idea? 回答1: I needed insert the following script inside at the top of ren.py : import sys, os from airflow.models import Variable DAGBAGS_DIR = Variable.get('DAGBAGS_DIR') sys.path.append(DAGBAGS_DIR + '/bi/inbound_layer/') This way I make available the current folder packages. 回答2: For airflow DAG, when you

Apache Airflow tasks are stuck in a 'up_for_retry' state

送分小仙女□ 提交于 2020-03-22 09:26:12
问题 I've been setting up an airflow cluster on our system and previously it has been working. I'm not sure what I may have done to change this. I have a DAG which I want to run on a schedule. To make sure it's working I'd also like to trigger it manually. Neither of these seem to be working at the moment and no logs seem to be being written for the task instances. The only logs available are the airflow scheduler logs which generally look healthy. I am just constantly met with this message: Task

Any success story installing private dependency on GCP Composer Airflow?

匆匆过客 提交于 2020-03-22 07:55:09
问题 Background info Normally within a container environment I can easily install my private dependency with a requirements.txt like this: --index-url https://user:pass@some_repo.jfrog.io/some_repo/api/pypi/pypi/simple some-private-lib The package "some-private-lib" is the one I wanted to install. Issue Within the GCP Composer environment, I tried with the GCloud command ( gcloud composer environments update ENV_NAME --update-pypi-packages-from-file ./requirements.txt --location LOCATION ), but it

Any success story installing private dependency on GCP Composer Airflow?

假装没事ソ 提交于 2020-03-22 07:54:42
问题 Background info Normally within a container environment I can easily install my private dependency with a requirements.txt like this: --index-url https://user:pass@some_repo.jfrog.io/some_repo/api/pypi/pypi/simple some-private-lib The package "some-private-lib" is the one I wanted to install. Issue Within the GCP Composer environment, I tried with the GCloud command ( gcloud composer environments update ENV_NAME --update-pypi-packages-from-file ./requirements.txt --location LOCATION ), but it

Any success story installing private dependency on GCP Composer Airflow?

廉价感情. 提交于 2020-03-22 07:54:10
问题 Background info Normally within a container environment I can easily install my private dependency with a requirements.txt like this: --index-url https://user:pass@some_repo.jfrog.io/some_repo/api/pypi/pypi/simple some-private-lib The package "some-private-lib" is the one I wanted to install. Issue Within the GCP Composer environment, I tried with the GCloud command ( gcloud composer environments update ENV_NAME --update-pypi-packages-from-file ./requirements.txt --location LOCATION ), but it

Generating uuid and use it across Airflow DAG

二次信任 提交于 2020-03-16 08:34:40
问题 I'm trying to create a dynamic airflow that has the following 2 tasks: Task 1: Creates files with a generated UUID as part of their name Task 2: Runs a check on those files So I define a variable 'FILE_UUID' and sets it as follow: str(uuid.uuid4()). And also created a constant file name: MY_FILE = '{file_uuid}_file.csv'.format(file_uuid=FILE_UUID} Then - Task 1 is a bashOperator that get MY_FILE as part of the command, and it creates a file successfully. I can see the generated files include

Generating uuid and use it across Airflow DAG

谁说胖子不能爱 提交于 2020-03-16 08:34:28
问题 I'm trying to create a dynamic airflow that has the following 2 tasks: Task 1: Creates files with a generated UUID as part of their name Task 2: Runs a check on those files So I define a variable 'FILE_UUID' and sets it as follow: str(uuid.uuid4()). And also created a constant file name: MY_FILE = '{file_uuid}_file.csv'.format(file_uuid=FILE_UUID} Then - Task 1 is a bashOperator that get MY_FILE as part of the command, and it creates a file successfully. I can see the generated files include

CentOS7安装Airflow

拈花ヽ惹草 提交于 2020-03-14 14:37:47
实验环境: centos7 python3.6 安装配置: 1.看看是否有gcc,没有的话需要进行安装: yum install gcc (后续安装airflow如果不成功,可以再次执行,它会更新包)【这个很重要哦】 2.安装脚本和依赖: yum install -y python36 yum install -y python36-pip yum install -y python36-develpip3 install paramiko  安装airflow前,还需要安装依赖的环境: yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel   安装airflow pip3 install apache-airflow 安装pymysql pip3 install pymysql 3.配置环境变量 # vi /etc/profile   #airflow   export AIRFLOW_HOME=/software/airflow # source /etc/profile 初始化 1.初始化数据库表(默认使用本地的sqlite数据库):