How to list down all the dataflow jobs using python API

问题

My use case involves fetching the job id of all streaming dataflow jobs present in my project and cancel it. Update the sources for my dataflow job and re-run it.

I am trying to achieve this using python. I did not come across any useful documentation until now. I thought of using python's library subprocess to execute the gcloud commands as a workaround. But again I was not able to store the result and use it.

Can somebody please guide me as what is the best way of doing this.

回答1:

In addition to using the rest API directly, you can use the generated Python bindings for the API in google-api-python-client. For simple calls it doesn't add that much value, but when passing in many parameters it can be easier to work with than a raw HTTP library.

With that library, the jobs list call would look like

from googleapiclient.discovery import build
import google.auth
credentials, project_id = google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform'])
df_service = build('dataflow', 'v1b3', credentials=credentials)
response = df_service.projects().locations().jobs().list(
  project_id=project_id,
  location='<region>').execute()

回答2:

You can use directly the Dataflow rest api like this

    from google.auth.transport.requests import AuthorizedSession
    import google.auth

    base_url = 'https://dataflow.googleapis.com/v1b3/projects/'

    credentials, project_id = google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform'])
    project_id = 'PROJECT_ID'
    location = 'europe-west1'
    authed_session = AuthorizedSession(credentials)
    response = authed_session.request('GET', f'{base_url}{project_id}/locations/{location}/jobs')
    print(response.json())

You have to import the google-auth dependency.

You can also add the query parameter ?filter=ACTIVE to get only the active dataflow, that can match with your streaming jobs.

来源：https://stackoverflow.com/questions/62991245/how-to-list-down-all-the-dataflow-jobs-using-python-api

标签

python

google-cloud-platform

google-cloud-dataflow