google-bigquery

Compute percentiles by group in BigQuery

99封情书 提交于 2021-01-29 05:11:08
问题 After searching around, I could not find a solution on this. With the following example: with my_data as ( select 1 as num, 'a' as letter union all select 2 as num, 'a' as letter union all select 3 as num, 'a' as letter union all select 4 as num, 'a' as letter union all select 5 as num, 'a' as letter union all select 6 as num, 'b' as letter union all select 7 as num, 'b' as letter union all select 8 as num, 'b' as letter union all select 9 as num, 'b' as letter union all select 10 as num, 'b'

bq command-line tool: query fails when text includes “ > ” or “ < ”

帅比萌擦擦* 提交于 2021-01-29 04:24:10
问题 I'm having problems using the bq command-line tool to run queries that contain a > or < symbol. The first two examples below show that when I try to select rows from a table where id > 300 nothing is returned, yet when I select for id=301 I get a result. The second two examples show that when I try to select rows where id < 300 I get a syntax error, but when I select for id=299 I get a result. Does anyone know why this is happening and how to fix it? Many thanks, Steve C:\Users\stephen

bq command-line tool: query fails when text includes “ > ” or “ < ”

北城余情 提交于 2021-01-29 04:22:11
问题 I'm having problems using the bq command-line tool to run queries that contain a > or < symbol. The first two examples below show that when I try to select rows from a table where id > 300 nothing is returned, yet when I select for id=301 I get a result. The second two examples show that when I try to select rows where id < 300 I get a syntax error, but when I select for id=299 I get a result. Does anyone know why this is happening and how to fix it? Many thanks, Steve C:\Users\stephen

Writing failed row inserts in a streaming job to bigquery using apache beam JAVA SDK?

白昼怎懂夜的黑 提交于 2021-01-29 02:40:03
问题 While running a streaming job its always good to have logs of rows which were not processed while inserting into big query. Catching and write those into another big query table will give an idea for what went wrong. Below are the steps that you can try to achieve the same. 回答1: Pre-requisites: apache-beam >= 2.10.0 or latest Using the getFailedInsertsWithErr() function available in the sdk you can easily catch the failed inserts and push to another table for performing RCA. This becomes an

Error in bq load “Could not convert value to string”

旧时模样 提交于 2021-01-29 02:04:15
问题 I tried to load logs from Google Cloud Storage to BigQuery by the bq command and I've got this error "Could not convert value to string". my example data {"ids":"1234,5678"} {"ids":1234} my example schema [ { "name":"ids", "type":"string" } ] It seems IDs can't convert by none quote at single ID. Data is made with fluent-plugin-s3, but more than one ID connected by a comma can be bound up with a quotation and isn't made single id. How can I load these data to BigQuery? Thanks in advance 回答1:

BigQuery converting string to datetime

北城以北 提交于 2021-01-28 21:20:47
问题 I'm using BigQuery to try I have a table with a string column called 'DATAUTILIZACAO' that has the following sample values: 02/11/16 12:19:08,000000 02/11/16 17:39:41,000000 The text is formatted as "DD/MM/YY HH:mm:ss" I need to create a new column of type DATETIME containing the value of DATAUTILIZACAO . How can I get the value from DATAUTILIZACAO format it as "YYYY-MM-DD HH:MI:SS" and save it to the new column? Can I do that using Query+UDF directly ? Thanks, Leo 回答1: Try below (for

BigQuery connector ClassNotFoundException in PySpark on Dataproc

微笑、不失礼 提交于 2021-01-28 20:07:28
问题 I'm trying to run a script in PySpark, using Dataproc. The script is kind of a merge between this example and what I need to do, as I wanted to check if everything works. Obviously, it doesn't. The error I get is: File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : java.lang.ClassNotFoundException: com.google.cloud.hadoop.io

How do I parse value from JSON array into columns in BigQuery

笑着哭i 提交于 2021-01-28 19:33:05
问题 I have a JSON array that is similar to this {"key":"Email","slug":"customer-email","value":"abc@gmail.com"} {"key":"Phone Number","slug":"mobile-phone-number","value":"123456789"} {"key":"First Name","slug":"first-name","value":"abc"} {"key":"Last Name","slug":"last-name","value":"xyz"} {"key":"Date of birth","slug":"date-of-birth","value":"01/01/1990"} I am hoping to turn the array into columns like this email| phoneNumber | firstName | lastName | dob abc@gmail.com 123456789 abc xyz 01/01

Calling beam.io.WriteToBigQuery in a beam.DoFn

▼魔方 西西 提交于 2021-01-28 19:11:33
问题 I've created a dataflow template with some parameters. When I write the data to BigQuery, I would like to make use of these parameters to determine which table it is supposed to write to. I've tried calling WriteToBigQuery in a ParDo as suggested in the following link. How can I write to Big Query using a runtime value provider in Apache Beam? The pipeline ran successfully but it is not creating or loading data to BigQuery. Any idea what might be the issue? def run(): pipeline_options =

Calculate a running total with a condition in BigQuery

南楼画角 提交于 2021-01-28 19:02:05
问题 Sorry bad topic… I need to calculate a running total but need to reset the total on a condition (when expected reached = 0). I have this table: Date, Registrations, Expected Registrations, Expected reached 2020-03-01, 5, 4,1 2020-03-02, 7, 5,1 2020-03-03, 8, 6,1 2020-03-04, 2, 5,0 2020-03-05, 5, 4,1 2020-03-06, 7, 5,1 2020-03-07, 8, 6,1 2020-03-08, 2, 5,0 Expected result with running total - the condition is that while “Expected Reached” <> 0 running total should be calculated. If “Expected