google-cloud-composer

Cloud Composer (Airflow) jobs stuck

允我心安 提交于 2019-12-06 08:03:19
My Cloud Composer managed Airflow got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A) I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown state (I suppose the job of Task A) ( snapshot of my Jobs ). Besides, it seems that the scheduler is not running since recently deleted DAGs keep appearing in the dashboard Is there a way to kill the jobs or reset the scheduler? Any idea to un-stuck the composer will be welcomed. You can restart the scheduler as follows: From your cloud shell: 1

Running docker operator from Google Cloud Composer

核能气质少年 提交于 2019-12-06 05:12:13
问题 As for the documentation, Google Cloud Composer airflow worker nodes are served from a dedicated kubernetes cluster: I have a Docker contained ETL step that I would like to run using airflow, preferably on the same Kubernetes that is hosting the Workers OR on a dedicated cluster. What would be the best practice for starting Docker Operation from Cloud Composer airflow environment? Pragmatic solutions are ❤️ 回答1: Google Cloud Composer has just recently released into General Availability, and

How to pass dynamic arguments Airflow operator?

橙三吉。 提交于 2019-12-06 02:23:40
I am using Airflow to run Spark jobs on Google Cloud Composer. I need to Create cluster (YAML parameters supplied by user) list of spark jobs (job params also supplied by per job YAML) With the Airflow API - I can read YAML files, and push variables across tasks using xcom. But, consider the DataprocClusterCreateOperator() cluster_name project_id zone and a few other arguments are marked as templated. What if I want to pass in other arguments as templated (which are currently not so)? - like image_version , num_workers , worker_machine_type etc? Is there any workaround for this? Not sure what

Airflow, mark a task success or skip it before dag run

喜夏-厌秋 提交于 2019-12-05 20:33:05
We have a huge DAG, with many small and fast tasks and a few big and time consuming tasks. We want to run just a part of the DAG, and the easiest way that we found is to not add the task that we don't want to run. The problem is that our DAG has many co-dependencies, so it became a real challenge to not broke the dag when we want to skip some tasks. Its there a way to add a status to the task by default? (for every run), something like: # get the skip list from a env variable task_list = models.Variable.get('list_of_tasks_to_skip') dag.skip(task_list) or for task in task_list: task.status =

Running docker operator from Google Cloud Composer

荒凉一梦 提交于 2019-12-04 10:19:52
As for the documentation, Google Cloud Composer airflow worker nodes are served from a dedicated kubernetes cluster: I have a Docker contained ETL step that I would like to run using airflow, preferably on the same Kubernetes that is hosting the Workers OR on a dedicated cluster. What would be the best practice for starting Docker Operation from Cloud Composer airflow environment? Pragmatic solutions are ❤️ Google Cloud Composer has just recently released into General Availability, and with that you are now able to use a KubernetesPodOperator to launch pods into the same GKE cluster that the

Using Dataflow vs. Cloud Composer

假装没事ソ 提交于 2019-12-03 15:03:53
问题 I apologize for this naive question, but I'd like to get some clarification on whether Cloud Dataflow or Cloud Composer is the right tool for the job, and I wasn't clear from the Google Documentation. Currently, I'm using Cloud Dataflow to read a non-standard csv file -- do some basic processing -- and load it into BigQuery. Let me give a very basic example: # file.csv type\x01date house\x0112/27/1982 car\x0111/9/1889 From this file we detect the schema and create a BigQuery table, something

Using Dataflow vs. Cloud Composer

前提是你 提交于 2019-12-03 03:52:59
I apologize for this naive question, but I'd like to get some clarification on whether Cloud Dataflow or Cloud Composer is the right tool for the job, and I wasn't clear from the Google Documentation. Currently, I'm using Cloud Dataflow to read a non-standard csv file -- do some basic processing -- and load it into BigQuery. Let me give a very basic example: # file.csv type\x01date house\x0112/27/1982 car\x0111/9/1889 From this file we detect the schema and create a BigQuery table, something like this: `table` type (STRING) date (DATE) And, we also format our data to insert (in python) into

Google Cloud Composer BigQuery Operator- Get Jobs API HTTPError 404

懵懂的女人 提交于 2019-12-02 13:21:25
问题 I am trying to run a BigQueryOperator on GCC. I have already succeeded in running for BigQueryCreateEmptyTableOperator and BigQueryTableDeleteOperator. Here is my code for the dag: import datetime import os import logging from airflow import configuration from airflow import models from airflow import DAG from airflow.operators import email_operator from airflow.contrib.operators import bigquery_operator from airflow.contrib.operators import bigquery_check_operator from airflow.utils import

Google Cloud Composer and Google Cloud SQL

和自甴很熟 提交于 2019-12-01 18:06:11
What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud Storage). Can the Cloud SQL proxy be exposed in some way on pods part the Kubernetes cluster hosting Composer? If not can the Cloud SQL Proxy be brought in by using the Kubernetes Service Broker? -> https://cloud.google.com/kubernetes-engine/docs/concepts/add-on/service-broker Should Airflow be used to schedule and call GCP API commands like 1)

Google Cloud Composer and Google Cloud SQL

无人久伴 提交于 2019-12-01 16:23:50
问题 What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud Storage). Can the Cloud SQL proxy be exposed in some way on pods part the Kubernetes cluster hosting Composer? If not can the Cloud SQL Proxy be brought in by using the Kubernetes Service Broker? -> https://cloud.google.com/kubernetes-engine/docs