问题
My use case involves fetching the job id of all streaming dataflow jobs present in my project and cancel it. Update the sources for my dataflow job and re-run it.
I am trying to achieve this using python. I did not come across any useful documentation until now. I thought of using python's library subprocess to execute the gcloud commands as a workaround. But again I was not able to store the result and use it.
Can somebody please guide me as what is the best way of doing this.
回答1:
In addition to using the rest API directly, you can use the generated Python bindings for the API in google-api-python-client. For simple calls it doesn't add that much value, but when passing in many parameters it can be easier to work with than a raw HTTP library.
With that library, the jobs list call would look like
from googleapiclient.discovery import build
import google.auth
credentials, project_id = google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform'])
df_service = build('dataflow', 'v1b3', credentials=credentials)
response = df_service.projects().locations().jobs().list(
project_id=project_id,
location='<region>').execute()
回答2:
You can use directly the Dataflow rest api like this
from google.auth.transport.requests import AuthorizedSession
import google.auth
base_url = 'https://dataflow.googleapis.com/v1b3/projects/'
credentials, project_id = google.auth.default(scopes=['https://www.googleapis.com/auth/cloud-platform'])
project_id = 'PROJECT_ID'
location = 'europe-west1'
authed_session = AuthorizedSession(credentials)
response = authed_session.request('GET', f'{base_url}{project_id}/locations/{location}/jobs')
print(response.json())
You have to import the google-auth dependency.
You can also add the query parameter ?filter=ACTIVE
to get only the active dataflow, that can match with your streaming jobs.
来源:https://stackoverflow.com/questions/62991245/how-to-list-down-all-the-dataflow-jobs-using-python-api