google-bigquery

Facing Issue in understanding BigQuery Stored Procedure used in pivoting

瘦欲@ 提交于 2020-07-10 10:26:52
问题 I am trying to do pivoting using BigQuery Stored Procedure like explained in this link Input Table: Output Required: First part of stored procedure is to generate list of all values which are to be used to generate new columns like below : EXECUTE IMMEDIATE ( "SELECT STRING_AGG(' "||aggregation ||"""(IF('||@pivot_col_name||'="'||x.value||'", '||@pivot_col_value||', null)) '||x.value) FROM UNNEST(( SELECT APPROX_TOP_COUNT("""||pivot_col_name||", @max_columns) FROM `"||table_name||"`)) x" )

BigQuery: Lookup array of ids type RECORD and join data from secondary table using SQL

♀尐吖头ヾ 提交于 2020-07-10 10:26:47
问题 I have a data structure like below: Products | name | region_ids | ---------------------------------- | shoe | c32, a43, x53 | | hat | c32, f42 | # Schema name STRING NULLABLE region_ids RECORD REPEATED region_ids.value STRING NULLABLE Regions | _id | name | --------------------- | c32 | london | | a43 | manchester | | x53 | bristol | | f42 | liverpool | # Schema _id STRING NULLABLE name STRING NULLABLE I want to look up the array of "region_ids" and replace them by the region name to result

Google BigQuery Schema conflict (pyarrow error) with Numeric data type using load_table_from_dataframe

江枫思渺然 提交于 2020-07-10 08:44:06
问题 I got the following error when I upload numeric data (int64 or float64) from a Pandas dataframe to a "Numeric" Google BigQuery Data Type: pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16) I tried to change the datatype of 'tt' field from Pandas dataframe without results: df_data_f['tt'] = df_data_f['tt'].astype('float64') and df_data_f['tt'] = df_data_f['tt'].astype('int64') Using the schema: job_config.schema = [ ... bigquery.SchemaField('tt', 'NUMERIC') ...] Reading this

REGEXP_MATCH in BigQuery Standard SQL

喜夏-厌秋 提交于 2020-07-06 09:33:05
问题 Although the BigQuery Standard SQL documentation mentions the function REGEXP_MATCH [1], it seems to be unavailable when running a query, with the web interface returning: Error : Function not found: REGEXP_MATCH What would be an alternative to using it? [1] https://cloud.google.com/bigquery/sql-reference/functions-and-operators#regexp_match 回答1: what would be an alternative to using it? You should use REGEXP_CONTAINS 来源: https://stackoverflow.com/questions/38575732/regexp-match-in-bigquery

Elastic search with Google Big Query

那年仲夏 提交于 2020-07-04 09:18:25
问题 I have the event logs loaded in elasticsearch engine and I visualise it using Kibana. My event logs are actually stored in the Google Big Query table. Currently I am dumping the json files to a Google bucket and download it to a local drive. Then using logstash, I move the json files from the local drive to the elastic search engine. Now, I am trying to automate the process by establishing the connection between google big query and elastic search. From what I have read, I understand that

How to infer avro schema from a kafka topic in Apache Beam KafkaIO

柔情痞子 提交于 2020-07-03 12:59:10
问题 I'm using Apache Beam's kafkaIO to read from a topic that has an avro schema in Confluent schema registry. I'm able to deserialize the message and write to files. But ultimately i want to write to BigQuery. My pipeline isn't able to infer the schema. How do I extract/infer the schema and attach it to the data in the pipeline so that my downstream processes (write to BigQuery) can infer the schema? Here is the code where I use the schema registry url to set the deserializer and where i read

Convert Long table to wide table in BigQuery

只谈情不闲聊 提交于 2020-07-02 03:06:38
问题 I have a BigQuery table like this: Required Output is : Note : The keys in Extended_property_key column are not fixed, it keeps on adding frequently. Hence the columns in Output will also keep on adding. I need to build a Bigquery which can handle dynamic adding of columns in output query along with pivoting. 回答1: Below is for BigQuery Standard SQL EXECUTE IMMEDIATE ''' SELECT account_id, ''' || ( SELECT STRING_AGG(DISTINCT "MAX(IF(Extended_property_key = '" || Extended_property_key || "',

Does indentation matter in SQL?

生来就可爱ヽ(ⅴ<●) 提交于 2020-06-29 04:07:26
问题 I'm doing a microcourse on Kaggle on which two seemingly identical blocks (except the indentation) produce different results. 1. answers_query = """ SELECT a.id, a.body, a.owner_user_id FROM `bigquery-public-data.stackoverflow.posts_questions` AS q INNER JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a ON q.id = a.parent_id WHERE q.tags LIKE '%bigquery%' """ # Set up the query safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10) answers_query_job = client.query

Getting an “Invalid schema update. Cannot add fields” from BQ, with ALLOW_FIELD_ADDITION set in the configuration

久未见 提交于 2020-06-29 03:53:08
问题 The following python code snippet produces the error in the title: job_config = bigquery.QueryJobConfig() # Set the destination table table_ref = client.dataset(args.bq_dataset_id).table(args.bq_cum_table) job_config.destination = table_ref job_config.write_disposition = 'WRITE_APPEND' job_config.schemaUpdateOptions = ['ALLOW_FIELD_ADDITION', 'ALLOW_FIELD_RELAXATION'] # Start the query, passing in the extra configuration. query_job = client.query( sqlstr, # Location must match that of the

Convert Long table to wide table in BigQuery

≡放荡痞女 提交于 2020-06-29 03:48:42
问题 I have a BigQuery table like this: Required Output is : Note : The keys in Extended_property_key column are not fixed, it keeps on adding frequently. Hence the columns in Output will also keep on adding. I need to build a Bigquery which can handle dynamic adding of columns in output query along with pivoting. 回答1: Below is for BigQuery Standard SQL EXECUTE IMMEDIATE ''' SELECT account_id, ''' || ( SELECT STRING_AGG(DISTINCT "MAX(IF(Extended_property_key = '" || Extended_property_key || "',