google-bigquery

Command to import multiple files from Cloud Storage into BigQuery

£可爱£侵袭症+ 提交于 2021-01-29 09:36:54
问题 I've figured that this command lists paths to all files: gsutil ls "gs://bucket/foldername/*.csv" This command imports a file to BQ and autodetects schema: bq load --autodetect --source_format=CSV dataset.tableName gs://bucket/foldername/something.csv Now I need to make it work together to import all files to respective tables in BQ. If table exists, then replace it. Could you give me a hand? 回答1: First, create a file with all the list with all the folders you want to load into BigQuery:

Using BigQuery in Google Sheets, how do I give another user permission to press 'refresh'?

社会主义新天地 提交于 2021-01-29 09:32:13
问题 I have BigQuery permissions, and Google Sheets, under a company owned Google account. I have made a spreadsheet for a colleague, the data for which is there via a BigQuery data connection, which creates a sheet in the spreadsheet with the results of that query. There's a button at the bottom left that tells you how many rows were returned, what time it was run, and has a 'REFRESH' button on it. I can press the refresh button and it refreshes the data by running the query again. My colleague

Change comma with dot for sub unitary values

有些话、适合烂在心里 提交于 2021-01-29 08:23:36
问题 I have a column with comma separated values such as 1,6 and 8 . I tried the following code in a BigQuery and it works for 1,6 but for ,8 the result is -.8 . How can I change it to 0.8 number format? SELECT column_name, REPLACE(column_name,',','.') AS Price FROM table_name 回答1: This is a working example to format your data based on BigQuery formatting syntax WITH `table_name` AS ( SELECT '1.6' as column_name UNION ALL SELECT '.8' ) SELECT column_name, format("%g",CAST(REPLACE(column_name,',','

What format does BigQuery timestamp take?

落爺英雄遲暮 提交于 2021-01-29 08:21:12
问题 I am trying to input a timestamp type into BigQuery as a RFC3339 string: "2019-07-25T11:07:41-04:00" It doesn't seem to be working. What format does Timestamp type expect in BigQuery? The documentation doesn't specify input. 回答1: That is a valid format for casting or coercion, but be mindful that leading and trailing whitespace (or whitespace before the timezone offset, which many systems add) will cause the coercion to fail. I would suggest something along the lines of (using TIMESTAMP()):

How to fill irregularly missing values with linear interepolation in BigQuery?

空扰寡人 提交于 2021-01-29 08:16:00
问题 I have data which has missing values irregulaly, and I'd like to convert it with a certain interval with liner interpolation using BigQuery Standard SQL. Specifically, I have data like this: # data is missing irregulary +------+-------+ | time | value | +------+-------+ | 1 | 3.0 | | 5 | 5.0 | | 7 | 1.0 | | 9 | 8.0 | | 10 | 4.0 | +------+-------+ and I'd like to convert this table as follows: # interpolated with interval of 1 +------+--------------------+ | time | value_interpolated | +------

Spark writing Parquet array<string> converts to a different datatype when loading into BigQuery

拥有回忆 提交于 2021-01-29 07:37:03
问题 Spark Dataframe Schema: StructType( [StructField("a", StringType(), False), StructField("b", StringType(), True), StructField("c" , BinaryType(), False), StructField("d", ArrayType(StringType(), False), True), StructField("e", TimestampType(), True) ]) When I write the data frame to parquet and load it into BigQuery, it interprets the schema differently. It is a simple load from JSON and write to parquet using spark dataframe. BigQuery Schema: [ { "type": "STRING", "name": "a", "mode":

Spark writing Parquet array<string> converts to a different datatype when loading into BigQuery

让人想犯罪 __ 提交于 2021-01-29 07:31:20
问题 Spark Dataframe Schema: StructType( [StructField("a", StringType(), False), StructField("b", StringType(), True), StructField("c" , BinaryType(), False), StructField("d", ArrayType(StringType(), False), True), StructField("e", TimestampType(), True) ]) When I write the data frame to parquet and load it into BigQuery, it interprets the schema differently. It is a simple load from JSON and write to parquet using spark dataframe. BigQuery Schema: [ { "type": "STRING", "name": "a", "mode":

BigQuery exceeding CPU limits

好久不见. 提交于 2021-01-29 07:25:37
问题 I keep getting the error Query exceeded resource limits. 2730.807817954678 CPU seconds were used, and this query must use less than 2500.0 CPU seconds. at [2:3] At first I was running this query: create temp table data as select * from table left join othertable using(caseid); EXECUTE IMMEDIATE ( SELECT """ SELECT caseid, """ || STRING_AGG("""MAX(IF(code = '""" || code || """', 1, 0)) AS _""" || REPLACE(code, '.', '_'), ', ') || """ FROM data GROUP BY caseid """ FROM ( SELECT DISTINCT code

GCE RAM and CPU usage in BigQuery

别等时光非礼了梦想. 提交于 2021-01-29 06:07:57
问题 Is there a way to export Google Compute Engine instances logs into BigQuery which will allow you to query the exported logs to get CPU and RAM usage for a selected period using the instance label? I reviewed already the Default Logging Agent logs doc which shows what Stackdriver Logging collects but RAM and CPU usage isn't mentioned. I also found this Viewing Activity Logs and Exporting with the Logs Viewer but none of them are relevant to my need. Thanks in advance 回答1: Stackdriver has some

How to ignore an unknown column when loading to bigQuery using Airflow?

↘锁芯ラ 提交于 2021-01-29 05:21:08
问题 I'm loading data from Google Storage to bigQuery using GoogleCloudStorageToBigQueryOperator It may be that the Json file will have more columns than what I defined. In that case I want the load job continue - simply ignore this unrecognized column. I tried to use the ignore_unknown_values argument but it didn't make any difference. My operator: def dc(): return [ { "name": "id", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "storeId", "type": "INTEGER", "mode": "NULLABLE" }, ... ] gcs_to