google-bigquery

Google bigquery export table to multiple files in Google Cloud storage and sometimes one single file

泄露秘密 提交于 2021-02-10 20:25:46
问题 I am using Bigquery python libraries to export data from Bigquery tables into GCS in csv format. I have given a wildcard pattern assuming some tables can be more than 1 GB Sometimes even though table is few MB it creates multiple files and sometimes just it creates just 1 file. Is there a logic behind this? My export workflow is the following: project = bq_project dataset_id = bq_dataset_id table_id = bq_table_id bucket_name =bq_bucket_name workflow_name=workflow_nm csv_file_nm=workflow_nm+"/

how to join two tables with condition may contains regex condition or array condition

杀马特。学长 韩版系。学妹 提交于 2021-02-10 18:47:02
问题 I have two tables tab1 and tab2 and the data like as follows tab1: tab2: Here item description in tab1 and tab2 is not matching is there any way to join these two tables to fetch the customer ids Thanks 回答1: Try below #standardSQL CREATE TEMPORARY FUNCTION similarity(Text1 STRING, Text2 STRING) RETURNS FLOAT64 LANGUAGE js AS """ var _extend = function(dst) { var sources = Array.prototype.slice.call(arguments, 1); for (var i=0; i<sources.length; ++i) { var src = sources[i]; for (var p in src)

how to join two tables with condition may contains regex condition or array condition

一曲冷凌霜 提交于 2021-02-10 18:46:21
问题 I have two tables tab1 and tab2 and the data like as follows tab1: tab2: Here item description in tab1 and tab2 is not matching is there any way to join these two tables to fetch the customer ids Thanks 回答1: Try below #standardSQL CREATE TEMPORARY FUNCTION similarity(Text1 STRING, Text2 STRING) RETURNS FLOAT64 LANGUAGE js AS """ var _extend = function(dst) { var sources = Array.prototype.slice.call(arguments, 1); for (var i=0; i<sources.length; ++i) { var src = sources[i]; for (var p in src)

Cast String into Date in BIGQUERY When Date is in the following format: M/D/YYYY

岁酱吖の 提交于 2021-02-10 18:15:27
问题 I have a string that is a date and it is in M/D/YYYY ie: 1/1/2018 12/31/2018 I get an invalid date error ( it shows: '2/18/2018' as the invalid date) Any ideas? 回答1: Below is example for BigQuery Standard SQL #standardSQL WITH `project.dataset.table` AS ( SELECT '1/1/2018' date_as_string UNION ALL SELECT '12/31/2018' ) SELECT PARSE_DATE('%m/%d/%Y', date_as_string) date_as_date FROM `project.dataset.table` with output: Row date_as_date 1 2018-01-01 2 2018-12-31 来源: https://stackoverflow.com

bigquery aggregate for daily basis

为君一笑 提交于 2021-02-10 17:30:36
问题 I have a table in big-query (datawarehouse): and I would like to have the result of: Here is the explanation on how the calculation should be: 2017-10-01 = $100 is obvious, because the data is only one 2017-10-02 = $400 is a sum of the first row and third row. Why? Because second row and third row have the same invoice. So we only use the latest update. 2017-10-04 = $800 is a sum of row 1,3, and 4. Why? It is because we only take one invoice only per day. row 1 (T001), row 3(T002), row 4(T003

Make existing bigquery table clustered

…衆ロ難τιáo~ 提交于 2021-02-10 16:56:12
问题 I have a quite huge existing partitioned table in bigquery. I want to make the table clustered, at least for the new partition. From the documentation: https://cloud.google.com/bigquery/docs/creating-clustered-tables, it is said that we are able to Creating a clustered table when you load data and I have tried to load a new partition using clustering fields: job_config.clustering_fields = ["event_type"] . The load finished successfully, however it seems that the new partition is not clustered

Make existing bigquery table clustered

流过昼夜 提交于 2021-02-10 16:55:56
问题 I have a quite huge existing partitioned table in bigquery. I want to make the table clustered, at least for the new partition. From the documentation: https://cloud.google.com/bigquery/docs/creating-clustered-tables, it is said that we are able to Creating a clustered table when you load data and I have tried to load a new partition using clustering fields: job_config.clustering_fields = ["event_type"] . The load finished successfully, however it seems that the new partition is not clustered

Update a nested field in BigQuery using another nested field as a condition

五迷三道 提交于 2021-02-10 15:54:55
问题 I am trying to update the sourcePropertyDisplayName on a ga_sessions_ table WHERE it matches the value of another nested field. I found this answer here: Update nested field in BigQuery table But this only has a very simple WHERE TRUE; whereas I only want to apply it if it matches a specified hits.eventInfo.eventCategory. Here is what I have so far: UPDATE `dataset_name`.`ga_sessions_20170720` SET hits = ARRAY( SELECT AS STRUCT * REPLACE( (SELECT AS STRUCT sourcePropertyInfo.* REPLACE(

Update a nested field in BigQuery using another nested field as a condition

纵饮孤独 提交于 2021-02-10 15:52:43
问题 I am trying to update the sourcePropertyDisplayName on a ga_sessions_ table WHERE it matches the value of another nested field. I found this answer here: Update nested field in BigQuery table But this only has a very simple WHERE TRUE; whereas I only want to apply it if it matches a specified hits.eventInfo.eventCategory. Here is what I have so far: UPDATE `dataset_name`.`ga_sessions_20170720` SET hits = ARRAY( SELECT AS STRUCT * REPLACE( (SELECT AS STRUCT sourcePropertyInfo.* REPLACE(

Update a nested field in BigQuery using another nested field as a condition

北慕城南 提交于 2021-02-10 15:52:13
问题 I am trying to update the sourcePropertyDisplayName on a ga_sessions_ table WHERE it matches the value of another nested field. I found this answer here: Update nested field in BigQuery table But this only has a very simple WHERE TRUE; whereas I only want to apply it if it matches a specified hits.eventInfo.eventCategory. Here is what I have so far: UPDATE `dataset_name`.`ga_sessions_20170720` SET hits = ARRAY( SELECT AS STRUCT * REPLACE( (SELECT AS STRUCT sourcePropertyInfo.* REPLACE(