google-bigquery

Load Pandas DF to Big Query fails

大憨熊 提交于 2021-01-21 09:21:14
问题 Im using to following code (Based on example pandas-gbq-migration) as following: from google.cloud import bigquery import pandas import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "link_to_credentials.json" df = pandas.DataFrame( { 'my_string': ['a', 'b', 'c'], 'my_int64': [1, 2, 3], 'my_float64': [4.0, 5.0, 6.0], } ) client = bigquery.Client() dataset_ref = client.dataset('TMP') table_ref = dataset_ref.table('yosh_try_uload_from_client') client.load_table_from_dataframe(df, table_ref)

Adding a “calculated column” to BigQuery query without repeating the calculations

夙愿已清 提交于 2021-01-20 13:44:08
问题 I want to resuse value of calculated columns in a new third column. For example, this query works: select countif(cond1) as A, countif(cond2) as B, countif(cond1)/countif(cond2) as prct_pass From Where Group By But when I try to use A,B instead of repeating the countif, it doesn't work because A and B are invalid: select countif(cond1) as A, countif(cond2) as B, A/B as prct_pass From Where Group By Can I somehow make the more readable second version work ? Is this first one inefficient ? 回答1:

REGEXP_CONTAINS order and Case statement in bigquery

时光总嘲笑我的痴心妄想 提交于 2021-01-20 13:43:45
问题 I'm using case statement and REGEXP_CONTAINS.Just wanted to see if the following order will give me the correct output. (CASE WHEN REGEXP_CONTAINS(AdSet, '(?i)BUS') THEN "BUS" WHEN REGEXP_CONTAINS(AdSet, '(?i)BRA') THEN "BR" WHEN REGEXP_CONTAINS(AdSet, '(?i)DIG') THEN "TR" WHEN REGEXP_CONTAINS(AdSet, '(?i)INS') THEN "INS" WHEN REGEXP_CONTAINS(AdSet, '(?i)INV') THEN "INV" WHEN REGEXP_CONTAINS(AdSet, '(?i)SAV') THEN "SAV" WHEN REGEXP_CONTAINS(AdSet, '(?i)TRA') THEN "TR" WHEN REGEXP_CONTAINS

REGEXP_CONTAINS order and Case statement in bigquery

一曲冷凌霜 提交于 2021-01-20 13:43:05
问题 I'm using case statement and REGEXP_CONTAINS.Just wanted to see if the following order will give me the correct output. (CASE WHEN REGEXP_CONTAINS(AdSet, '(?i)BUS') THEN "BUS" WHEN REGEXP_CONTAINS(AdSet, '(?i)BRA') THEN "BR" WHEN REGEXP_CONTAINS(AdSet, '(?i)DIG') THEN "TR" WHEN REGEXP_CONTAINS(AdSet, '(?i)INS') THEN "INS" WHEN REGEXP_CONTAINS(AdSet, '(?i)INV') THEN "INV" WHEN REGEXP_CONTAINS(AdSet, '(?i)SAV') THEN "SAV" WHEN REGEXP_CONTAINS(AdSet, '(?i)TRA') THEN "TR" WHEN REGEXP_CONTAINS

Write BigQuery results to GCS in CSV format using Apache Beam

[亡魂溺海] 提交于 2021-01-18 07:12:29
问题 I am pretty new working on Apache Beam , where in I am trying to write a pipeline to extract the data from Google BigQuery and write the data to GCS in CSV format using Python. Using beam.io.read(beam.io.BigQuerySource()) I am able to read the data from BigQuery but not sure how to write it to GCS in CSV format. Is there a custom function to achieve the same , could you please help me? import logging import apache_beam as beam PROJECT='project_id' BUCKET='project_bucket' def run(): argv = [ '

Write BigQuery results to GCS in CSV format using Apache Beam

删除回忆录丶 提交于 2021-01-18 07:09:57
问题 I am pretty new working on Apache Beam , where in I am trying to write a pipeline to extract the data from Google BigQuery and write the data to GCS in CSV format using Python. Using beam.io.read(beam.io.BigQuerySource()) I am able to read the data from BigQuery but not sure how to write it to GCS in CSV format. Is there a custom function to achieve the same , could you please help me? import logging import apache_beam as beam PROJECT='project_id' BUCKET='project_bucket' def run(): argv = [ '

BigQuery check entire table for null values

瘦欲@ 提交于 2021-01-07 06:11:34
问题 Not sure if a reproducible example is necessary here. I have a big-ish and wide-ish table in BigQuery (10K rows x 100 cols) and I would like to know if any columns have null values, and how many null values there are. Is there a query that I can run that would return a 1-row table indicating the number of null values in each column, that doesn't require 100 ifnull calls? Thanks! 回答1: Below is for BigQuery Standard SQL #standardSQL SELECT col_name, COUNT(1) nulls_count FROM `project.dataset

BigQuery check entire table for null values

大憨熊 提交于 2021-01-07 06:11:20
问题 Not sure if a reproducible example is necessary here. I have a big-ish and wide-ish table in BigQuery (10K rows x 100 cols) and I would like to know if any columns have null values, and how many null values there are. Is there a query that I can run that would return a 1-row table indicating the number of null values in each column, that doesn't require 100 ifnull calls? Thanks! 回答1: Below is for BigQuery Standard SQL #standardSQL SELECT col_name, COUNT(1) nulls_count FROM `project.dataset

Bigquery “FOR SYSTEM_TIME AS OF” feature guarantee for data recovery

五迷三道 提交于 2021-01-07 02:52:03
问题 In Google BigQuery, it is possible to retrieve rows of a table (snapshot) in the past (at least in the last 7 days) : With Legacy SQL, we can use snapshot decorators : #legacySQL SELECT * FROM [PROJECT_ID:DATASET.TABLE@-3600000] With Standard SQL, we can use FOR SYSTEM_TIME AS OF in FROM clause : #standardSQL SELECT * FROM `PROJECT_ID.DATASET.TABLE` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR); Both examples return snapshots of PROJECT_ID.DATASET.TABLE one hour

Google Dataflow: insert + update in BigQuery in a streaming pipeline

≯℡__Kan透↙ 提交于 2021-01-07 02:30:41
问题 The main object A python streaming pipeline in which I read the input from pub/sub. After the input is analyzed, two option are available: If x=1 -> insert If x=2 -> update Testing This can not be done using apache beam function, so you need to develop it using the 0.25 API of BigQuery (currently this is the version supported in Google Dataflow) The problem The inserted record are still in the BigQuery buffer, so the update statement fail: UPDATE or DELETE statement over table table would