airflow

Airflow dynamic tasks at runtime

匿名 (未验证) 提交于 2019-12-03 01:38:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Other questions about 'dynamic tasks' seem to address dynamic construction of a DAG at schedule or design time. I'm interested in dynamically adding tasks to a DAG during execution. from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator from datetime import datetime dag = DAG('test_dag', description='a test', schedule_interval='0 0 * * *', start_date=datetime(2018, 1, 1), catchup=False) def make_tasks(): du1 = DummyOperator(task_id='dummy1', dag=dag) du2

Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime

匿名 (未验证) 提交于 2019-12-03 01:36:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am trying to run some hive job in airflow. I made custome jdbc connection which you can see in the image. I could query hive tables through airflow web ui(data profiling->ad hoc query). Also I want to run some sample dag file from Internet: #File Name: wf_incremental_load.py from airflow import DAG from airflow.operators import BashOperator, HiveOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'start_date': datetime(2019, 3, 13), 'retries': 1, 'retry_delay': timedelta(minutes=5) } dag = DAG('hive_test'

ImportError : cannot import DAG airflow

匿名 (未验证) 提交于 2019-12-03 01:35:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have simple code, I am trying to import DAG from airflow from airflow import DAG from airflow . operators import BashOperator , S3KeySensor from datetime import datetime , timedelta import psycopg2 from datetime import date , timedelta yesterday = date . today () - timedelta ( 1 ) yesterdayDate = yesterday . strftime ( '%Y-%m-%d' ) But, I am getting Import Error Traceback ( most recent call last ): File "airflow.py" , line 9 , in <module> from airflow import DAG File "/home/ubuntu/airflow/dags/airflow.py" , line 9 , in <module> from

Airflow k8s operator xcom - Handshake status 403 Forbidden

匿名 (未验证) 提交于 2019-12-03 01:34:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: When I run a docker image using KubernetesPodOperator in Airflow version 1.10 Once the pod finishes the task successfullly, airflow tries to get the xcom value by making a connection to the pod via k8s stream client. Following is the error which I encountered: [2018-12-18 05:29:02,209] {{models.py:1760}} ERROR - (0) Reason: Handshake status 403 Forbidden Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/kubernetes/stream/ws_client.py", line 249, in websocket_call client = WSClient(configuration, get_websocket

Apache Airflow - customize logging format

匿名 (未验证) 提交于 2019-12-03 01:34:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Is it possible to customize the format that Airflow uses for logging? I tried adding a LOG_FORMAT variable in $AIRFLOW_HOME/airflow.cfg, but it doesn't seem to take effect LOG_FORMAT = "%(asctime)s logLevel=%(levelname)s logger=%(name)s - %(message)s" 回答1: You need to change the settings.py file in the airflow package to change the log format Update settings.py (after LOGGING_LEVEL add below line): LOG_FORMAT = os.path.expanduser(conf.get('core', 'LOG_FORMAT')) Update airflow.cfg configuration file: Add line under [core]: LOG_FORMAT = "%

setting up s3 for logs in airflow

匿名 (未验证) 提交于 2019-12-03 01:27:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am using docker-compose to set up a scalable airflow cluster. I based my approach off of this Dockerfile https://hub.docker.com/r/puckel/docker-airflow/ My problem is getting the logs set up to write/read from s3. When a dag has completed I get an error like this *** Log file isn't local. *** Fetching here: http://ea43d4d49f35:8793/log/xxxxxxx/2017-06-26T11:00:00 *** Failed to fetch log file from worker. *** Reading remote logs... Could not read logs from s3://buckets/xxxxxxx/airflow/logs/xxxxxxx/2017-06- 26T11:00:00 I set up a new section

Airflow: how to extend SubDagOperator?

匿名 (未验证) 提交于 2019-12-03 01:22:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: When I try to extend the SubDagOperator provided in airflow API, airflow webserver GUI does not recognize it as SubDagOperator thereby disabling me to zoom in to the subdag. How can I extend the SubDagOperator while preserving the ability to zoom in to it as a subdag? Am I missing something? 回答1: Please see the example below on how to extend the SubDagOperator. The key in your case was to override the task_type function from airflow import DAG from airflow.operators.subdag_operator import SubDagOperator from airflow.operators.dummy_operator

Airflow S3KeySensor - How to make it continue running

匿名 (未验证) 提交于 2019-12-03 01:20:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: With the help of this Stackoverflow post I just made a program (the one shown in the post) where when a file is placed inside an S3 bucket a task in one of my running DAGs is triggered and then I perform some work using the BashOperator. Once it's done though the DAG is no longer in a running state but instead goes into a success state and if I want to have it pick up another file I need to clear all the 'Past', 'Future', 'Upstream', 'Downstream' activity. I would like to make this program so that it's always running and anytime a new file

Airflow: dag_id could not be found

匿名 (未验证) 提交于 2019-12-03 01:18:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm running an airflow server and worker on different AWS machines. I've synced that dags folder between them, ran airflow initdb on both, and checked that the dag_id's are the same when I run airflow list_tasks <dag_id> When I run the scheduler and worker, I get this error on the worker: airflow.exceptions.AirflowException: dag_id could not be found: . Either the dag did not exist or it failed to parse. [...] Command ...--local -sd /home/ubuntu/airflow/dags/airflow_tutorial.py' What seems to be the problem is that the path there is wrong (

Airflow ExternalTaskSensor gets stuck

匿名 (未验证) 提交于 2019-12-03 01:12:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm trying to use ExternalTaskSensor and it gets stuck at poking another DAG's task, which has already been successfully completed. Here, a first DAG "a" completes its task and after that a second DAG "b" through ExternalTaskSensor is supposed to be triggered. Instead it gets stuck at poking for a.first_task. First DAG: import datetime from airflow import DAG from airflow.operators.python_operator import PythonOperator dag = DAG( dag_id='a', default_args={'owner': 'airflow', 'start_date': datetime.datetime.now()}, schedule_interval=None )