google-bigquery

How to generate with scripting INTERVAL 1 <day|week|month>?

☆樱花仙子☆ 提交于 2020-02-15 10:10:30
问题 We are trying to find a syntax to generate the DAY|WEEK|MONTH options from the 3rd param of date functions. DECLARE var_date_option STRING DEFAULT 'DAY'; select GENERATE_DATE_ARRAY('2019-01-01','2020-01-01',INTERVAL 1 WEEK) dynamic param here -^^^ Do you know what's the proper syntax to use in DECLARE and that should be converted to valid SQL. 回答1: Below is for BigQuery Standard SQL Those DAY|WEEK|MONTH are LITERALs and cannot be parametrized And, as you know - dynamic SQL is also not

Migrate hive table to Google BigQuery

喜欢而已 提交于 2020-02-11 08:53:26
问题 I am trying to design a sort of data pipeline to migrate my Hive tables into BigQuery. Hive is running on an Hadoop on premise cluster. This is my current design, actually, it is very easy, it is just a shell script: for each table source_hive_table { INSERT overwrite table target_avro_hive_table SELECT * FROM source_hive_table; Move the resulting avro files into google cloud storage using distcp Create first BQ table: bq load --source_format=AVRO your_dataset.something something.avro Handle

Migrate hive table to Google BigQuery

☆樱花仙子☆ 提交于 2020-02-11 08:52:52
问题 I am trying to design a sort of data pipeline to migrate my Hive tables into BigQuery. Hive is running on an Hadoop on premise cluster. This is my current design, actually, it is very easy, it is just a shell script: for each table source_hive_table { INSERT overwrite table target_avro_hive_table SELECT * FROM source_hive_table; Move the resulting avro files into google cloud storage using distcp Create first BQ table: bq load --source_format=AVRO your_dataset.something something.avro Handle

How to Use DELETE and INSERT simultaneously in Query Script

微笑、不失礼 提交于 2020-02-08 10:00:11
问题 I am using Google BigQuery(Uses Standard SQL) but I have a Table with some data. Based on the data in the Table, I want to insert fake messages rows to that table, and after that, delete all data(the newly inserted fake messages) from that table, But I do worry of deleting all of data within that table . Any examples of how to properly query something like this? 回答1: If you are worried about accidentally deleting data, I would create a view that combines your actual data with your fake data.

createFile() in google Apps Script is not functioning properly

谁都会走 提交于 2020-02-08 09:20:19
问题 I am trying to create a file. It works fine when I run the following code segment from the debugger in apps script. However, when I run it real time from the spreadsheet, it says I do not have permission to call createfile. Everything that is logged is identical. The issue is not I do not have authority as I am the only one in the spreadsheet and am the owner. The purpose of the CSV is to move it from my google drive into data for BigQuery function saveAsCSV(row) { //Doc to Csv //row = 3; /

BigQuery: Do clustered tables remain sorted in the face of streaming inserts? [duplicate]

耗尽温柔 提交于 2020-02-07 03:12:52
问题 This question already has answers here : Why the cost of a query on today cluster/partition is much higher than on previous dates? (2 answers) Closed 10 months ago . I have hourly batch jobs that need to scan all the data that has streamed into my table in the last hour. Right now I'm using a date-partitioned table, which means that every time I scan a date partition for an hour's worth of data, I have to scan rows from all hours of that day. I've been thinking about clustering this table on

Live data from BigQuery into a Python DataFrame

时光总嘲笑我的痴心妄想 提交于 2020-02-06 08:49:12
问题 I am exploring ways to bring BigQuery data into Python, here is my code so far: from google.cloud import bigquery from pandas.io import gbq client = bigquery.Client.from_service_account_json("path_to_my.json") project_id = "my_project_name" query_job = client.query(""" #standardSQL SELECT date, SUM(totals.visits) AS visits FROM `projectname.dataset.ga_sessions_20*` AS t WHERE parse_date('%y%m%d', _table_suffix) between DATE_sub(current_date(), interval 3 day) and DATE_sub(current_date(),

How to remove duplicated row by timestamp in BigQuery?

て烟熏妆下的殇ゞ 提交于 2020-02-06 07:21:07
问题 I have a products table with the following schema: id createdOn, updatedOn, stock, status createdOn & updatedOn are TimeStamp . createdOn is the paratition field. Say this is the data I have now: id createdOn, updatedOn, stock, status 1 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5 2 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 5 12 3 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5 I have a ETL that append new rows to this table. when the ETL is finished I can

How to remove duplicated row by timestamp in BigQuery?

风格不统一 提交于 2020-02-06 07:21:05
问题 I have a products table with the following schema: id createdOn, updatedOn, stock, status createdOn & updatedOn are TimeStamp . createdOn is the paratition field. Say this is the data I have now: id createdOn, updatedOn, stock, status 1 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5 2 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 5 12 3 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5 I have a ETL that append new rows to this table. when the ETL is finished I can

How to load specific date format into BigQuery

跟風遠走 提交于 2020-02-05 03:47:47
问题 When loading a .csv file into BigQuery with dates with this format DD/MM/YY it doesn't work, if I specify the schema for the table and I select Date Format. However, if I don't specify the schema and I choose Automatically detect it works and converts the date format into YYYY-MM-DD . Is there any possibility of convert the date into the right format manually and specify the name for that field? Thanks, David 回答1: Unfortunatelly, there is no way to control date formatting from the load API.