问题
My situation is as follows:
Google Account A has some data in BigQuery.
Google Account B manages Account A's BigQuery data, and has also been given editor privileges for Account A's Cloud Platform project.
Account B has a Sheet in Google Drive that has some cool reference data in it. Account B logs into the BQ Web console, and creates a table in Account A's BQ project that is backed by this sheet.
All is well. Account B can query and join to this table successfully within Account A's BQ data from the web UI.
Problem:
Google Account A also has a service account that is an editor for Google Account A's Cloud Platform Project. This service account manages and queries the data in BQ using the python google-cloud API. When this service account attempts to query the reference table that is backed by Account B's GDrive Sheet, the job fails with this error:
Encountered an error while globbing file pattern. JobID: "testing_gdrivesheet_query_job1"
Near as I can tell this is actually an authentication issue. How can I give Account A's service account appropriate access to Account B's GDrive so it can access that reference table?
Bonus Points: Is there any performance difference between a table backed by a GDrive Sheet vs a native BQ table?
回答1:
You should be able to get this working with the following steps:
First share the sheet with the email/"service account id" associated with the service account.
Then you'll be able to access your sheet-backed table if you create a Client with the bigquery and drive scopes. (You might need to have domain-wide-delegation enabled on the service account).
scopes = ['https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'<path_to_json>', scopes=scopes)
# Instantiates a client
client = bigquery.Client(project = PROJECT, credentials = credentials)
bqQuery = client.run_sync_query(q)
bqQuery.run()
bqQuery.fetch_data()
回答2:
While Orbit's answer helped me to find a solution for the issue, there are a few more things you need to consider. Therefore, I like to add my detailed solution to the problem. This solution is required if Orbit's basic solution does not work, in particular, if you use the G Suite and your policies do not allow sharing sheets/docs with accounts outside of your domain. In this case you cannot directly share a doc/sheet with the service account.
Before you start:
- Create or select a service account in your project
- Enable Domain-wide Delegation (DwD) in the account settings. If not present, this generates an OAuth client ID for the service account.
- Make sure the delegated
user@company.com
has access to the sheet. Add the required scopes to your service account's OAuth client (you may need to ask a G Suite admin to do this for you):
https://www.googleapis.com/auth/bigquery
https://www.googleapis.com/auth/drive
If the delegated user can access your drive-based table in the BigQuery UI, your service account should now also be able to access it on behalf of the delegated user.
Here is a full code snippet that worked for me:
#!/usr/bin/env python
import httplib2
from google.cloud import bigquery
from oauth2client.service_account import ServiceAccountCredentials
scopes = [
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
delegated_user = "user@example.com"
project = 'project-name'
table = 'dataset-name.table-name'
query = 'SELECT count(*) FROM [%s:%s]' % (project, table)
creds = ServiceAccountCredentials.from_json_keyfile_name('secret.json', scopes=scopes)
creds = creds.create_delegated(delegated_user)
http = creds.authorize(httplib2.Http())
client = bigquery.Client(http=http)
bq = client.run_sync_query(query)
bq.run()
print bq.fetch_data()
Note that I was not able to setup the delegation directly and needed to create an HTTP client using creds = creds.create_delegated(delegated_user)
and http = creds.authorize(httplib2.Http())
. The authorized HTTP client can then be used as HTTP client for the BigQuery client: client = bigquery.Client(http=http)
.
Also note that the service account does not need to have any predefined roles assigned in the project settings, i.e., you do not have to make it a bigquery user or even a project owner. I suppose it acquires access primarily via delegation.
来源:https://stackoverflow.com/questions/40391128/how-do-i-authenticate-a-service-account-to-make-queries-against-a-gdrive-sheet-b