airflow

How to export large data from Postgres to S3 using Cloud composer?

六眼飞鱼酱① 提交于 2020-05-15 18:33:04
问题 I have been using the Postgres to S3 operator to load data from Postgres to S3. But recently, I had to export a very large table and my Airflow composer fails without any logs, this could be because we are using the NamedTemporaryFile function of Python's tempfile module to create a temporary file and we are using this temporary file to load to S3. Since we are using Composer, this will be loaded to Composer's local memory, and since the size of the file is very large, it is failing. Refer

Why does Apache airflow fail with the command: 'airflow initdb'?

时间秒杀一切 提交于 2020-05-15 17:44:31
问题 I am trying to install airflow on an AWS EC2 instance. The process seems to be pretty well documented by various sources on the web, however, I have run into a problem after I 'pip install' airflow; I get the below error when I execute the command 'airflow initdb': [2019-09-25 13:22:02,329] {__init__.py:51} INFO - Using executor SequentialExecutor Traceback (most recent call last): File "/home/cloud-user/.local/bin/airflow", line 22, in <module> from airflow.bin.cli import CLIFactory File "

Why does Apache airflow fail with the command: 'airflow initdb'?

橙三吉。 提交于 2020-05-15 17:41:20
问题 I am trying to install airflow on an AWS EC2 instance. The process seems to be pretty well documented by various sources on the web, however, I have run into a problem after I 'pip install' airflow; I get the below error when I execute the command 'airflow initdb': [2019-09-25 13:22:02,329] {__init__.py:51} INFO - Using executor SequentialExecutor Traceback (most recent call last): File "/home/cloud-user/.local/bin/airflow", line 22, in <module> from airflow.bin.cli import CLIFactory File "

Why does Apache airflow fail with the command: 'airflow initdb'?

前提是你 提交于 2020-05-15 17:40:54
问题 I am trying to install airflow on an AWS EC2 instance. The process seems to be pretty well documented by various sources on the web, however, I have run into a problem after I 'pip install' airflow; I get the below error when I execute the command 'airflow initdb': [2019-09-25 13:22:02,329] {__init__.py:51} INFO - Using executor SequentialExecutor Traceback (most recent call last): File "/home/cloud-user/.local/bin/airflow", line 22, in <module> from airflow.bin.cli import CLIFactory File "

Can I programmatically determine if an Airflow DAG was scheduled or manually triggered?

血红的双手。 提交于 2020-05-15 09:37:05
问题 I want to create a snippet that passes the correct date based on whether the DAG was scheduled or whether it was triggered manually. The DAG runs monthly. The DAG generates a report (A SQL query) based on the data of the previous month. If I run the DAG scheduled, I can fetch the previous month with the following jinja snippet: execution_date.month given that the DAG is scheduled at the end of the previous period (last month) the execution_date will correctly return the last month. However on

How to manage python packages between airflow dags?

こ雲淡風輕ζ 提交于 2020-05-15 06:25:25
问题 If I have multiple airflow dags with some overlapping python package dependencies, how can I keep each of these project deps. decoupled? Eg. if I had project A and B on same server I would run each of them with something like... source /path/to/virtualenv_a/activate python script_a.py deactivate source /path/to/virtualenv_b/activate python script_b.py deactivate Basically, would like to run dags with the same situation (eg. each dag uses python scripts that have may have overlapping package

How to do store sql output to pandas dataframe using Airflow?

烂漫一生 提交于 2020-05-14 18:42:05
问题 I want to store data from SQL to Pandas dataframe and do some data transformations and then load to another table suing airflow Issue that I am facing is that connection string to tables are accessbale only through Airflow. So I need to use airflow as medium to read and write data. How can this be done ? MY code Task1 = PostgresOperator( task_id='Task1', postgres_conn_id='REDSHIFT_CONN', sql="SELECT * FROM Western.trip limit 5 ", params={'limit': '50'}, dag=dag The output of task needs to be

How to Trigger a DAG on the success of a another DAG in Airflow using Python?

跟風遠走 提交于 2020-05-14 17:47:57
问题 I have a python DAG Parent Job and DAG Child Job . The tasks in the Child Job should be triggered on the successful completion of the Parent Job tasks which are run daily. How can add external job trigger ? MY CODE from datetime import datetime, timedelta from airflow import DAG from airflow.operators.postgres_operator import PostgresOperator from utils import FAILURE_EMAILS yesterday = datetime.combine(datetime.today() - timedelta(1), datetime.min.time()) default_args = { 'owner': 'airflow',

Where is Airflow webserver running on Google Composer?

我是研究僧i 提交于 2020-05-14 09:05:09
问题 I have following pods: NAME READY STATUS RESTARTS AGE airflow-database-init-job-ggk95 0/1 Completed 0 3h airflow-redis-0 1/1 Running 0 3h airflow-scheduler-7594cd584-mlfrt 2/2 Running 9 3h airflow-sqlproxy-74f64b8b97-csl8h 1/1 Running 0 3h airflow-worker-5fcd4fffff-7w2sg 2/2 Running 0 3h airflow-worker-5fcd4fffff-m44bs 2/2 Running 0 3h airflow-worker-5fcd4fffff-mm55s 2/2 Running 0 3h composer-agent-0034135a-3fed-49a6-b173-9d3f9d0569db-ktwwt 0/1 Completed 0 3h composer-agent-0034135a-3fed-49a6

How to mount volume of airflow worker to airflow kubernetes pod operator?

十年热恋 提交于 2020-05-13 07:27:27
问题 I am trying to using the kubernetes pod operator in airflow, and there is a directory that I wish to share with kubernetes pod on my airflow worker, is there is a way to mount airflow worker's directory to kubernetes pod? I tried with the code below, and the volumn seems not mounted successfully. import datetime import unittest from unittest import TestCase from airflow.operators.kubernetes_pod_operator import KubernetesPodOperator from airflow.kubernetes.volume import Volume from airflow