airflow

Debugging Broken DAGs

感情迁移 提交于 2019-12-03 06:09:59
When the airflow webserver shows up errors like Broken DAG: [<path/to/dag>] <error> , how and where can we find the full stacktrace for these exceptions? I tried these locations: /var/log/airflow/webserver -- had no logs in the timeframe of execution, other logs were in binary and decoding with strings gave no useful information. /var/log/airflow/scheduler -- had some logs but were in binary form, tried to read them and looked to be mostly sqlalchemy logs probably for airflow's database. /var/log/airflow/worker -- shows up the logs for running DAGs, (same as the ones you see on the airflow

Airflow: pass {{ ds }} as param to PostgresOperator

江枫思渺然 提交于 2019-12-03 05:51:43
i would like to use execution date as parameter to my sql file: i tried dt = '{{ ds }}' s3_to_redshift = PostgresOperator( task_id='s3_to_redshift', postgres_conn_id='redshift', sql='s3_to_redshift.sql', params={'file': dt}, dag=dag ) but it doesn't work. dt = '{{ ds }}' Doesn't work because Jinja (the templating engine used within airflow) does not process the entire Dag definition file. For each Operator there are fields which Jinja will process, which are part of the definition of the operator itself. In this case, you can make the params field (which is actually called parameters , make

Error while install airflow: By default one of Airflow's dependencies installs a GPL

跟風遠走 提交于 2019-12-03 05:27:55
问题 Getting following error after running pip install airflow[postgres] command: raise RuntimeError("By default one of Airflow's dependencies installs a GPL " RuntimeError: By default one of Airflow's dependencies installs a GPL dependency (unidecode). To avoid this dependency set SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when you install or upgrade Airflow. To force installing the GPL version set AIRFLOW_GPL_UNIDECODE I am trying to install in debian 9 回答1: Try the following: export

airflow trigger_dag execution_date is the next day, why?

余生颓废 提交于 2019-12-03 05:17:57
Recently I have tested airflow so much that have one problem with execution_date when running airflow trigger_dag <my-dag> . I have learned that execution_date is not what we think at first time from here : Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for 2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be right after all data for 2016-02-19 becomes available. start_date = datetime.combine(datetime.today(), datetime.min.time()) args = { "owner": "xigua", "start_date": start_date } dag = DAG

Airflow Dags 执行时间的统计

爷,独闯天下 提交于 2019-12-03 04:21:20
执行SQL: SELECT execution_date , MIN(start_date) AS start , MAX(end_date) AS end , MAX(end_date) - MIN(start_date) AS duration FROM task_instance WHERE dag_id = 'HKG042476_Woocommerce_Handle' AND state = 'success' GROUP BY execution_date ORDER BY execution_date DESC 来源: https://www.cnblogs.com/lshan/p/11776800.html

Using Dataflow vs. Cloud Composer

前提是你 提交于 2019-12-03 03:52:59
I apologize for this naive question, but I'd like to get some clarification on whether Cloud Dataflow or Cloud Composer is the right tool for the job, and I wasn't clear from the Google Documentation. Currently, I'm using Cloud Dataflow to read a non-standard csv file -- do some basic processing -- and load it into BigQuery. Let me give a very basic example: # file.csv type\x01date house\x0112/27/1982 car\x0111/9/1889 From this file we detect the schema and create a BigQuery table, something like this: `table` type (STRING) date (DATE) And, we also format our data to insert (in python) into

Airflow using template files for PythonOperator

余生颓废 提交于 2019-12-03 03:49:33
问题 The method of getting a BashOperator or SqlOperator to pick up an external file for its template is somewhat clearly documented, but looking at the PythonOperator my test of what I understand from the docs is not working. I am not sure how the templates_exts and templates_dict parameters would correctly interact to pick up a file. In my dags folder I've created: pyoptemplate.sql and pyoptemplate.t as well as test_python_operator_template.py : pyoptemplate.sql: SELECT * FROM {{params.table}};

Airflow inside docker running a docker container

☆樱花仙子☆ 提交于 2019-12-03 03:45:14
I have airflow running on an EC2 instance, and I am scheduling some tasks that spin up a docker container. How do I do that? Do I need to install docker on my airflow container? And what is the next step after. I have a yaml file that I am using to spin up the container, and it is derived from the puckel/airflow Docker image Finally resolved My EC2 setup is running unbuntu Xenial 16.04 and using a modified the puckel/airflow docker image that is running airflow Things you will need to change in the Dockerfile Add USER root at the top of the Dockerfile USER root mounting docker bin was not

Airflow: pass {{ ds }} as param to PostgresOperator

匿名 (未验证) 提交于 2019-12-03 03:12:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: i would like to use execution date as parameter to my sql file: i tried dt = '{{ ds }}' s3_to_redshift = PostgresOperator( task_id='s3_to_redshift', postgres_conn_id='redshift', sql='s3_to_redshift.sql', params={'file': dt}, dag=dag ) but it doesn't work. 回答1: dt = '{{ ds }}' Doesn't work because Jinja (the templating engine used within airflow) does not process the entire Dag definition file. For each Operator there are fields which Jinja will process, which are part of the definition of the operator itself. In this case, you can make the

Airflow default on_failure_callback

匿名 (未验证) 提交于 2019-12-03 03:00:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: In my DAG file, I have define a on_failure_callback() function to post a Slack in case of failure. It works well if I specify for each operator in my DAG : on_failure_callback=on_failure_callback() Is there a way to automate (via default_args for instance, or via my DAG object) the dispatch to all of my operators? 回答1: I finally found a way to do that. You can pass your on_failure_callback as a default_args class Foo: @staticmethod def get_default_args(): """ Return default args :return: default_args """ default_args = { 'on_failure_callback