google-bigquery

Google Dataflow: insert + update in BigQuery in a streaming pipeline

…衆ロ難τιáo~ 提交于 2021-01-07 02:29:50
问题 The main object A python streaming pipeline in which I read the input from pub/sub. After the input is analyzed, two option are available: If x=1 -> insert If x=2 -> update Testing This can not be done using apache beam function, so you need to develop it using the 0.25 API of BigQuery (currently this is the version supported in Google Dataflow) The problem The inserted record are still in the BigQuery buffer, so the update statement fail: UPDATE or DELETE statement over table table would

Apache Beam write to BigQuery table and schema as params

梦想与她 提交于 2021-01-07 01:29:52
问题 I'm using Python SDK for Apache Beam. The values of the datatable and the schema are in the PCollection. This is the message I read from the PubSub: {"DEVICE":"rms005_m1","DATESTAMP":"2020-05-29 20:54:26.733 UTC","SINUMERIK__x_position":69.54199981689453,"SINUMERIK__y_position":104.31400299072266,"SINUMERIK__z_position":139.0850067138672} Then I want to write it to BigQuery using the values in the json message with the lambda function for the datatable and this function for the schema: def

Save Array<T> in BigQuery using Java

生来就可爱ヽ(ⅴ<●) 提交于 2021-01-07 01:28:03
问题 I'm trying to save data into Big query using Spark Big Query connector. Let say I have a Java pojo like below @Getter @Setter @AllArgsConstructor @ToString @Builder public class TagList { private String s1; private List<String> s2; } Now when I try to save this Pojo into Big query its throwing me below error Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Failed to load to test_table1 in job JobId{project=<project_id>, job=<job_id>, location

Save Array<T> in BigQuery using Java

徘徊边缘 提交于 2021-01-07 01:24:42
问题 I'm trying to save data into Big query using Spark Big Query connector. Let say I have a Java pojo like below @Getter @Setter @AllArgsConstructor @ToString @Builder public class TagList { private String s1; private List<String> s2; } Now when I try to save this Pojo into Big query its throwing me below error Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Failed to load to test_table1 in job JobId{project=<project_id>, job=<job_id>, location

Save Array<T> in BigQuery using Java

痴心易碎 提交于 2021-01-07 01:23:58
问题 I'm trying to save data into Big query using Spark Big Query connector. Let say I have a Java pojo like below @Getter @Setter @AllArgsConstructor @ToString @Builder public class TagList { private String s1; private List<String> s2; } Now when I try to save this Pojo into Big query its throwing me below error Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Failed to load to test_table1 in job JobId{project=<project_id>, job=<job_id>, location

Array concatenation with distinct elements in BigQuery

瘦欲@ 提交于 2021-01-06 07:24:20
问题 Let's say in each row I have an id and two arrays array_1 and array_2 that looks like following SELECT 'a' id, [1,2,3,4,5] array_1, [2,2,2,3,6] array_2 UNION ALL SELECT 'b', [2,3,4,5,6], [7,7,8,6,9] UNION ALL SELECT 'c', [], [1,4,5] I want concatenate these two arrays and only keep the unique elements in the new array. My desired output would look like the following +----+-----------+-----------+-----------------------------+ | id | array_1 | array_2 | concatenated_array_distinct | +----+----

Array concatenation with distinct elements in BigQuery

China☆狼群 提交于 2021-01-06 07:22:52
问题 Let's say in each row I have an id and two arrays array_1 and array_2 that looks like following SELECT 'a' id, [1,2,3,4,5] array_1, [2,2,2,3,6] array_2 UNION ALL SELECT 'b', [2,3,4,5,6], [7,7,8,6,9] UNION ALL SELECT 'c', [], [1,4,5] I want concatenate these two arrays and only keep the unique elements in the new array. My desired output would look like the following +----+-----------+-----------+-----------------------------+ | id | array_1 | array_2 | concatenated_array_distinct | +----+----

TextJoin like function based on a condition in on SQL

纵然是瞬间 提交于 2021-01-05 08:59:46
问题 Trying to figure out if it is possible to do a textjoin like function in SQL based on a condition. Right now the only way I can think of doing it is by running a pivot to make the rows of the column and aggregating them that way. I think this is the only way to transpose the data in SQL? Input This would be a aql table (tbl_fruit) that exists as the image depicts SELECT * FROM tbl_fruit Output 回答1: Below is for BigQuery Standard SQL (without specifically listing each column, thus in a way

Debugging BigQuery Stored procedure

梦想与她 提交于 2021-01-04 05:54:09
问题 Is there any way I can use print statements within BigQuery stored procedure? I have a stored procedure like below, I like to see how SQL statement is generated to debug the issue or any other better way to debug what stored procedure is producing etc. CREATE OR REPLACE PROCEDURE `myproject.TEST.check_duplicated_prc`(project_name STRING, data_set_name STRING, table_name STRING, date_id DATE) BEGIN DECLARE sql STRING; set sql ='Select date,col1,col2,col3,count(1) from `'||project_name||'.'|

Getting module 'google.protobuf.descriptor_pool' has no attribute 'Default' in my python script

爱⌒轻易说出口 提交于 2021-01-02 05:42:18
问题 I am new to python and was using a python script written by someone else. I was running it fine in a different PC. Just had to install coupe of packages including pip3 , google-cloud , google-cloud-bigquery and pandas . Now when I installed the same packages on a different PC, I am unable to run the script. It is showed the following error first: module = 'google.protobuf.descriptor_pb2' TypeError: expected bytes, Descriptor found However, when In purged/re-installed/updated packages and also