google-cloud-composer

Is it possible to turn off the VM´s hosting the google-cloud-composer at certain hours?

谁说我不能喝 提交于 2019-12-13 03:25:05
问题 In order to reduce the billing associated of running the google-cloud-composer, I am wondering about the possibility to turn off the VM instances that run the Virtual Environment at certain hours. For example: Most of our DAG´s run either in the morning or the afternoon, so we would like to turn off the VM´s during the night, or even during mid-day if is it possible. I know we can disable the environments manually from the Google cloud console, but it would be great to find a way to do this

How do I queue up backfills in airflow?

前提是你 提交于 2019-12-11 23:11:37
问题 I have DAG where max_active_runs is set to 2, but now I want to run backfills for 20ish runs. I actually expected airflow to sort of schedule all the backfills but only start 2 at a time, but that doesn't seem to happen. When I run the backfill command it starts two, but the command doesn't return since it didn't manage to start them all, instead, it keeps on trying until it succeeds. So what I expected was this: I ran the backfill command All the runs are marked as running Command returns

loop over airflow variables issue question

回眸只為那壹抹淺笑 提交于 2019-12-11 16:47:31
问题 I am having hard time looping over an airflow variable in my script so I have a requirement to list all files prefixed by string in a bucket. next loop through it and do some operations. I tried making use of xcomm and subdags but i couldn't figure it out so i came up with a new approach. it involves 2 scripts though 1 st scripts sets the airflow variable with a value i generate below is the code. #!/usr/bin/env python with DAG('Test_variable', default_args=default_args, schedule_interval

Creating dynamic tasks in airflow (in composer) based on bigquery response

怎甘沉沦 提交于 2019-12-11 16:25:01
问题 I am trying to create a airflow DAG which generates task depending on the response from server. Here is my approach : getlist of tables from bigquery -> loop through the list and create tasks This is my latest code and I have tried all possible code found in stack overflow. Nothing seems to work. What am I doing wrong? with models.DAG(dag_id="xt", default_args=default_args, schedule_interval="0 1 * * *", catchup=True) as dag: tables = get_tables_from_bq() bridge = DummyOperator( task_id=

Passing typesafe config conf files to DataProcSparkOperator

℡╲_俬逩灬. 提交于 2019-12-11 12:42:37
问题 I am using Google dataproc to submit spark jobs and google cloud composer to schedule them. Unfortunately, I am facing difficulties. I am relying on .conf files (typesafe config files) to pass arguments to my spark jobs. I am using the following python code for the airflow dataproc: t3 = dataproc_operator.DataProcSparkOperator( task_id ='execute_spark_job_cluster_test', dataproc_spark_jars='gs://snapshots/jars/pubsub-assembly-0.1.14-SNAPSHOT.jar', cluster_name='cluster', main_class = 'com

How can I restart the airflow server on Google Composer?

我是研究僧i 提交于 2019-12-10 15:55:00
问题 When I need to restart the webserver locally I do: ps -ef | grep airflow | awk '{print $2}' | xargs kill -9 airflow webserver -p 8080 -D How can I do this on Google Composer? I don't see an option to restart the server in the console. 回答1: Since Cloud Composer is an Apache Airflow managed service, it is not possible to restart the whole service. You can restart though the single instances of the service, as described here, but this will not help to apply to the plugin changes. To apply the

How to pass dynamic arguments Airflow operator?

主宰稳场 提交于 2019-12-10 10:23:44
问题 I am using Airflow to run Spark jobs on Google Cloud Composer. I need to Create cluster (YAML parameters supplied by user) list of spark jobs (job params also supplied by per job YAML) With the Airflow API - I can read YAML files, and push variables across tasks using xcom. But, consider the DataprocClusterCreateOperator() cluster_name project_id zone and a few other arguments are marked as templated. What if I want to pass in other arguments as templated (which are currently not so)? - like

Airflow, mark a task success or skip it before dag run

风格不统一 提交于 2019-12-10 10:15:32
问题 We have a huge DAG, with many small and fast tasks and a few big and time consuming tasks. We want to run just a part of the DAG, and the easiest way that we found is to not add the task that we don't want to run. The problem is that our DAG has many co-dependencies, so it became a real challenge to not broke the dag when we want to skip some tasks. Its there a way to add a status to the task by default? (for every run), something like: # get the skip list from a env variable task_list =

Can you get a static external IP address for Google Cloud Composer / Airflow?

不打扰是莪最后的温柔 提交于 2019-12-08 03:20:38
问题 I know how to assign a static external IP address to a Compute Engine, but can this be done with Google Cloud Composer (Airflow)? I'd imagine most companies need that functionality since they'd generally be writing back to a warehouse that is possibly behind a firewall, but I can't find any doc's on how to do this. 回答1: It's not possible to assign a static IP to the underlying GKE cluster in a Composer environment. The endpoint @kaxil mentioned is the Kubernetes master endpoint but not the

Can you get a static external IP address for Google Cloud Composer / Airflow?

不想你离开。 提交于 2019-12-06 15:31:39
I know how to assign a static external IP address to a Compute Engine, but can this be done with Google Cloud Composer (Airflow)? I'd imagine most companies need that functionality since they'd generally be writing back to a warehouse that is possibly behind a firewall, but I can't find any doc's on how to do this. It's not possible to assign a static IP to the underlying GKE cluster in a Composer environment. The endpoint @kaxil mentioned is the Kubernetes master endpoint but not the GKE nodes. If the intent is to let all outgoing network connections from Composer tasks use the same external