google-bigquery

Joins on Google Bigquery

坚强是说给别人听的谎言 提交于 2020-01-24 03:07:28
问题 I know that work is being done to improve the Join feature on Bigquery, not to rant here but it will be hard to analyze 'Terabyte' sets of data as 'advertised' if Joins can not be used properly. OK, back to the problem, I have two tables one is 600 Megs and the other one is 50Megs, I did try to make a join and I got an error about smaller table must be left. I did some research and I found out that Bigquery considers both tables as big if they are greater than 7MB? So based on some advice I

BigQueryIO Read vs fromQuery

喜欢而已 提交于 2020-01-24 00:25:35
问题 Say in Dataflow/Apache Beam program, I am trying to read table which has data that is exponentially growing. I want to improve the performance of the read. BigQueryIO.Read.from("projectid:dataset.tablename") or BigQueryIO.Read.fromQuery("SELECT A, B FROM [projectid:dataset.tablename]") Will the performance of my read improve, if i am only selecting the required columns in the table, rather than the entire table in above? I am aware that selecting few columns results in the reduced cost. But

Create dashboard on Firebase Database for various metrics

笑着哭i 提交于 2020-01-23 17:31:28
问题 I have events in firebase database table where each event has certain fields. One of the field is event_type . What I want to achieve is to be able to visualize in graphical form, how many events of each type comes daily? How do I do something like that in firebase database? Q1. Is it possible to directly do this in firebase? Q2. Do I need to move data to some other datasource (like Big query) and setup dashboard there? 回答1: It is definitely possible to create a dashboard with aggregate data

Where do you get Google Bigquery usage info (mainly for processed data)

本小妞迷上赌 提交于 2020-01-23 04:58:48
问题 I know that BigQuery offers the first "1 TB of data processed" per month for free but I can't figure out where to look on my dashboard to see my monthly usage. I used to be able to "revert" to the old dashboard which had the info but for the past couple of weeks the "old dashboard" isn't accessible. 回答1: From the Google Cloud Console overview page for your project, click on the "details" section on the top-right, next to the charge estimate : You'll get an estimate of the charges for the

Beam/Google Cloud Dataflow ReadFromPubsub Missing Data

守給你的承諾、 提交于 2020-01-23 03:32:07
问题 I have 2 dataflow streaming pipelines (pubsub to bigquery) with the following code : class transform_class(beam.DoFn): def process(self, element, publish_time=beam.DoFn.TimestampParam, *args, **kwargs): logging.info(element) yield element class identify_and_transform_tables(beam.DoFn): #Adding Publish Timestamp #Since I'm reading from a topic that consist data from multiple tables, #function here is to identify the tables and split them apart def run(pipeline_args=None): # `save_main_session`

How to run dynamic second query in google cloud dataflow?

青春壹個敷衍的年華 提交于 2020-01-23 03:29:05
问题 I'm attempting to do an operation wherein I get a list of Ids via a query, transform them into a string separated by commas (i.e. "1,2,3") and then use it in a secondary query. When attempting to run the second query, I'm given a syntax error: "Target type of a lambda conversion must be an interface" String query = "SELECT DISTINCT campaignId FROM `" + options.getEligibilityInputTable() + "` "; Pipeline p = Pipeline.create(options); p.apply("GetCampaignIds", BigQueryIO.readTableRows()

How to authenticate with gcloud big query using a json credentials file?

浪子不回头ぞ 提交于 2020-01-22 19:30:53
问题 In the gcloud documentation for google bigquery, it states that authentication can be determined from from_service_account_json. I've tried the following from gcloud import bigquery client = bigquery.Client.from_service_account_json('/Library/gcloud_api_credentials.json') The json file looks like the following (Note: Scrambled credentials so these are now fake). {"type": "service_account", "project_id": "example_project", "private_key_id": "c7e371776ab6e2dsfafdsaff97edf9377178c8", "private

How to authenticate with gcloud big query using a json credentials file?

吃可爱长大的小学妹 提交于 2020-01-22 19:30:49
问题 In the gcloud documentation for google bigquery, it states that authentication can be determined from from_service_account_json. I've tried the following from gcloud import bigquery client = bigquery.Client.from_service_account_json('/Library/gcloud_api_credentials.json') The json file looks like the following (Note: Scrambled credentials so these are now fake). {"type": "service_account", "project_id": "example_project", "private_key_id": "c7e371776ab6e2dsfafdsaff97edf9377178c8", "private

Run a saved bigquery query from Google Apps Script?

大憨熊 提交于 2020-01-22 12:48:28
问题 We frequently use Google Apps script to run BigQuery queries and put them into a Google Sheet. However, the workflow is annoying: Run the query in BigQuery until you get it right. Copy/paste to a text editor to put in the newline slashes Run it in apps script and hope it works Go back to BigQuery and repeat 1-3 if something doesn't work. Is there some way to just save a query using BigQuery's save function, and then call that specific query from a script? 回答1: Its a workaround... try saving

Run a saved bigquery query from Google Apps Script?

﹥>﹥吖頭↗ 提交于 2020-01-22 12:48:05
问题 We frequently use Google Apps script to run BigQuery queries and put them into a Google Sheet. However, the workflow is annoying: Run the query in BigQuery until you get it right. Copy/paste to a text editor to put in the newline slashes Run it in apps script and hope it works Go back to BigQuery and repeat 1-3 if something doesn't work. Is there some way to just save a query using BigQuery's save function, and then call that specific query from a script? 回答1: Its a workaround... try saving