google-bigquery | 易学教程

Command to import multiple files from Cloud Storage into BigQuery

阅读更多关于 Command to import multiple files from Cloud Storage into BigQuery

问题 I've figured that this command lists paths to all files: gsutil ls "gs://bucket/foldername/*.csv" This command imports a file to BQ and autodetects schema: bq load --autodetect --source_format=CSV dataset.tableName gs://bucket/foldername/something.csv Now I need to make it work together to import all files to respective tables in BQ. If table exists, then replace it. Could you give me a hand? 回答1: First, create a file with all the list with all the folders you want to load into BigQuery:

Using BigQuery in Google Sheets, how do I give another user permission to press 'refresh'?

阅读更多关于 Using BigQuery in Google Sheets, how do I give another user permission to press 'refresh'?

问题 I have BigQuery permissions, and Google Sheets, under a company owned Google account. I have made a spreadsheet for a colleague, the data for which is there via a BigQuery data connection, which creates a sheet in the spreadsheet with the results of that query. There's a button at the bottom left that tells you how many rows were returned, what time it was run, and has a 'REFRESH' button on it. I can press the refresh button and it refreshes the data by running the query again. My colleague

Change comma with dot for sub unitary values

阅读更多关于 Change comma with dot for sub unitary values

问题 I have a column with comma separated values such as 1,6 and 8 . I tried the following code in a BigQuery and it works for 1,6 but for ,8 the result is -.8 . How can I change it to 0.8 number format? SELECT column_name, REPLACE(column_name,',','.') AS Price FROM table_name 回答1: This is a working example to format your data based on BigQuery formatting syntax WITH `table_name` AS ( SELECT '1.6' as column_name UNION ALL SELECT '.8' ) SELECT column_name, format("%g",CAST(REPLACE(column_name,',','

What format does BigQuery timestamp take?

阅读更多关于 What format does BigQuery timestamp take?

问题 I am trying to input a timestamp type into BigQuery as a RFC3339 string: "2019-07-25T11:07:41-04:00" It doesn't seem to be working. What format does Timestamp type expect in BigQuery? The documentation doesn't specify input. 回答1: That is a valid format for casting or coercion, but be mindful that leading and trailing whitespace (or whitespace before the timezone offset, which many systems add) will cause the coercion to fail. I would suggest something along the lines of (using TIMESTAMP()):

How to fill irregularly missing values with linear interepolation in BigQuery?

阅读更多关于 How to fill irregularly missing values with linear interepolation in BigQuery?

问题 I have data which has missing values irregulaly, and I'd like to convert it with a certain interval with liner interpolation using BigQuery Standard SQL. Specifically, I have data like this: # data is missing irregulary +------+-------+ | time | value | +------+-------+ | 1 | 3.0 | | 5 | 5.0 | | 7 | 1.0 | | 9 | 8.0 | | 10 | 4.0 | +------+-------+ and I'd like to convert this table as follows: # interpolated with interval of 1 +------+--------------------+ | time | value_interpolated | +------

Spark writing Parquet array<string> converts to a different datatype when loading into BigQuery

阅读更多关于 Spark writing Parquet array converts to a different datatype when loading into BigQuery

问题 Spark Dataframe Schema: StructType( [StructField("a", StringType(), False), StructField("b", StringType(), True), StructField("c" , BinaryType(), False), StructField("d", ArrayType(StringType(), False), True), StructField("e", TimestampType(), True) ]) When I write the data frame to parquet and load it into BigQuery, it interprets the schema differently. It is a simple load from JSON and write to parquet using spark dataframe. BigQuery Schema: [ { "type": "STRING", "name": "a", "mode":

Spark writing Parquet array<string> converts to a different datatype when loading into BigQuery

阅读更多关于 Spark writing Parquet array converts to a different datatype when loading into BigQuery

BigQuery exceeding CPU limits

阅读更多关于 BigQuery exceeding CPU limits

问题 I keep getting the error Query exceeded resource limits. 2730.807817954678 CPU seconds were used, and this query must use less than 2500.0 CPU seconds. at [2:3] At first I was running this query: create temp table data as select * from table left join othertable using(caseid); EXECUTE IMMEDIATE ( SELECT """ SELECT caseid, """ || STRING_AGG("""MAX(IF(code = '""" || code || """', 1, 0)) AS _""" || REPLACE(code, '.', '_'), ', ') || """ FROM data GROUP BY caseid """ FROM ( SELECT DISTINCT code

GCE RAM and CPU usage in BigQuery

阅读更多关于 GCE RAM and CPU usage in BigQuery

问题 Is there a way to export Google Compute Engine instances logs into BigQuery which will allow you to query the exported logs to get CPU and RAM usage for a selected period using the instance label? I reviewed already the Default Logging Agent logs doc which shows what Stackdriver Logging collects but RAM and CPU usage isn't mentioned. I also found this Viewing Activity Logs and Exporting with the Logs Viewer but none of them are relevant to my need. Thanks in advance 回答1: Stackdriver has some

How to ignore an unknown column when loading to bigQuery using Airflow?

阅读更多关于 How to ignore an unknown column when loading to bigQuery using Airflow?

问题 I'm loading data from Google Storage to bigQuery using GoogleCloudStorageToBigQueryOperator It may be that the Json file will have more columns than what I defined. In that case I want the load job continue - simply ignore this unrecognized column. I tried to use the ignore_unknown_values argument but it didn't make any difference. My operator: def dc(): return [ { "name": "id", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "storeId", "type": "INTEGER", "mode": "NULLABLE" }, ... ] gcs_to