google-bigquery | 易学教程

Is there any other approach for updating a row in Big Query apart from overwriting the table?

阅读更多关于 Is there any other approach for updating a row in Big Query apart from overwriting the table?

问题 I have a package data with some of its fields as following: packageid-->string status--->string status_type--->string scans--->record(repeated) scanid--->string status--->string scannedby--->string Per day, I have a data of 100 000 packages. Total package data size per day becomes 100 MB(approx) and for 1 month it becomes 3GB. For each package, 3-4 updates can come. So do I have to overwrite the package table, every time a package update (e.g. just a change in status field) comes? Suppose I

Get the top patent countries, codes in a BQ public dataset

阅读更多关于 Get the top patent countries, codes in a BQ public dataset

问题 I am trying to use an analytic function to get the top 2 countries with patent applications, and within those top 2 countries, get the top 2 application kinds. For example, the answer will look something like this: country - code US P US A GB X GB P Here is the query I am using to get this: SELECT country_code, MIN(count_country_code) count_country_code, application_kind FROM ( WITH A AS ( SELECT country_code, COUNT(country_code) OVER (PARTITION BY country_code) AS count_country_code,

How to update table while joining on CTE results in BigQuery?

阅读更多关于 How to update table while joining on CTE results in BigQuery?

问题 I think it will be obvious to see what I'm trying to do through this simplified example (works in PostgreSQL).... with a as ( select 1 as id, 123.456 as value ) update mytable set value = coalesce(a1.value, a2.value) from a as a1, a as a2 where a1.id = mytable.id or a2.id = mytable.id2 This is a simplified example. In reality the "a" expression is pretty complex and I need to join to it multiple times in the update expression. Is there a way to do this in one statement in BigQuery? Right now,

How to update table while joining on CTE results in BigQuery?

阅读更多关于 How to update table while joining on CTE results in BigQuery?

How to authenticate with service account and bigrquery package?

阅读更多关于 How to authenticate with service account and bigrquery package?

问题 I have been able to authenticate using the json file associated with a service account using googleAuth and bigQueryR . # Load Packages global.packages <- c("bigQueryR", "googleAuthR") ### Apply require on the list of packages; load them quietly lapply(global.packages, require, character.only = TRUE, quietly = TRUE) Sys.setenv("GCS_AUTH_FILE" = "json_file_location") #Authenticate Google BQ googleAuthR::gar_attach_auto_auth("https://www.googleapis.com/auth/bigquery", environment_var = "GCS

bigquery backup all view definitions

阅读更多关于 bigquery backup all view definitions

问题 I am working with bigquery, and there have been a few hundred views created. Most of these are not used and should be deleted. However, there is a chance that some are used and I cannot just blindly delete all. Therefore, I need to backup all view definitions somehow before deleting them. Does anyone know of a good way? I am not trying to save the data, just the view definition queries and their names. Thanks for reading! 回答1: Part 1. Issue the bq ls command. The --format flag can be used to

BigQuery async query job - the fetch_results() method returns wrong number of values

阅读更多关于 BigQuery async query job - the fetch_results() method returns wrong number of values

问题 I am writing Python code with the BigQuery Client API, and attempting to use the async query code (written everywhere as a code sample), and it is failing at the fetch_data() method call. Python errors out with the error: ValueError: too many values to unpack So, the 3 return values (rows, total_count, page_token) seem to be the incorrect number of return values. But, I cannot find any documentation about what this method is supposed to return -- besides the numerous code examples that only

How to Sync Mysql into Bigquery in realtime?

阅读更多关于 How to Sync Mysql into Bigquery in realtime?

问题 Currently I have some script which first deletes the table and upload the table from MySQL to Bigquery. And many time it had failed. Plus it run only once a day. I am looking for some scalable and realtime solution. Your Help will be much appreciated :) 回答1: Read these series of posts from Wepay, where they detail how they sync their MySQL databases to BigQuery, using Airflow: https://wecode.wepay.com/posts/wepays-data-warehouse-bigquery-airflow https://wecode.wepay.com/posts/airflow-wepay

Count unique ids in a rolling time frame

阅读更多关于 Count unique ids in a rolling time frame

问题 I have a simple table as bellow with lots of IDs and dates. ID Date 10R46 2014-11-23 10R46 2016-04-11 100R9 2016-12-21 10R91 2013-05-03 ... ... I want to formulate a query which counts the unique IDs for a rolling time frame of dates, for example ten days. Meaning that for each date it should give me the number of unique IDs between that date and 10 days back. Result should look something like this. UniqueTenDays Date 200 2014-11-23 324 2014-11-24 522 2014-11-25 532 2014-11-26 ... ...

create a table with a column type RECORD

阅读更多关于 create a table with a column type RECORD

问题 I'm using big query and i want to create a job which populates a table with a "record" type columns. The data will be populated by a query - so how can i write a query which returns "record" type columns. Thanks! 回答1: Somehow option proposed by Pentium10 never worked for me in GBQ UI or API Explorer. I might be missing something Meantime, the workaround I found is as in below example SELECT location.state, location.city FROM JS( ( // input table SELECT NEST(CONCAT(state, ',', city)) AS