google-bigquery | 易学教程

How to deal with semantic version data type in BigQuery

阅读更多关于 How to deal with semantic version data type in BigQuery

问题 I know that there is no data type in BigQuery. What would you prefer to deal with semantic versions in BigQuery? I have the following schema: software:string, software_version:string software_version column is string but the data I store there is in semver format: `MAJOR.MINOR.PATCH-prerelease I especially want to perform operators < > = . select '4.0.0' < '4.0.0-beta' This returns true , but according to the semver definition it's false. Because the char - is used for prerelease. 回答1: Below

BigQuery Scheduled Data Transfer throws “Incompatible table partitioning specification.” Error - but error message is truncated

阅读更多关于 BigQuery Scheduled Data Transfer throws “Incompatible table partitioning specification.” Error - but error message is truncated

问题 I'm using the new BQ Data Transfer UI and upon scheduling a Data Transfer, the transfer fails. The error message in Run History isn't terribly helpful as the error message seems truncated. Incompatible table partitioning specification. Expects partitioning specification interval(type:hour), but input partitioning specification is ; JobID: xxxxxxxxxxxx Note the part of the error that says..."but input partition specification is..." with nothing before the semicolon. Seems this error is

How to convert UTC time to local timezones based on timezone column in bigquery?

阅读更多关于 How to convert UTC time to local timezones based on timezone column in bigquery?

问题 I've been trying to convert each UTC time back to the appropriate local timezone using standard SQL in GBQ, but couldn't find a good way to do it dynamically because I might have tons of different timezone name within the database. I'm wondering if anyone has an idea? The table I have contains 2 different columns (see screenshot) 回答1: Below example is for BigQuery Standard SQL #standardSQL WITH `project.dataset.yourtable` AS ( SELECT 'Pacific/Honolulu' timezone, TIMESTAMP '2020-03-01 03:41:27

Return the highest count record

阅读更多关于 Return the highest count record

问题 The data I am working on looks like below- A_ID B_ID count 123 abcd 1000 123 aaaa 2000 123 aaaa 3000 456 null 50 456 bbbb 6000 456 cccc 450 I want to be able to extract the B_id that has the highest count for a given A_id The result should look like- A_ID B_ID count 123 aaaa 3000 456 bbbb 6000 How to achieve this result? 回答1: One option is to filter with a subquery: select t.* from mytable t where t.count = (select max(t1.count) from mytable t1 where t1.a_id = t.a_id) You can also use window

IF Conditional to Run Schedule Query

阅读更多关于 IF Conditional to Run Schedule Query

问题 I'm using BigQuery. I have a query-scheduler to generate a table (RESULT TABLE) that depends on another table (SOURCE TABLE). The case is, this source table doesn't always have data, there's a possibility that this source table is empty. I want to Schedule the Query to make the RESULT TABLE only if there's data in SOURCE TABLE. The example would be: IF COUNT(1) FROM data.source_table > 0 THEN RUN: SELECT * FROM data.source_table LEFT JOIN data.other_source_table ELSE [Don't Run] Thanks in

BigQuery Scheduled Data Transfer throws “Incompatible table partitioning specification.” Error - but error message is truncated

阅读更多关于 BigQuery Scheduled Data Transfer throws “Incompatible table partitioning specification.” Error - but error message is truncated

Accessing NOAA Data via BigQuery

阅读更多关于 Accessing NOAA Data via BigQuery

问题 I am trying to access the NOAA Data via BigQuery. I used the following code to achieve the same : import os from google.oauth2 import service_account credentials = service_account.Credentials.from_service_account_file("my-json-file-path-with-filename.json") from google.cloud import bigquery # Create a "Client" object client = bigquery.Client(credentials=credentials) But getting the following error : DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE

PyTest teardown_class is being run too soon

阅读更多关于 PyTest teardown_class is being run too soon

问题 The Python "teardown_class" is not behaving as I expect it to. Below is a summary of my code: @classmethod def setup_class(cls): cls.create_table(table1) cls.create_table(table2) cls.create_table(table3) @classmethod def create_table(cls, some_arg_here): """Some code here that creates the table""" def test_foo(self): """Some test code here""" @classmethod def teardown_class(cls): """Perform teardown things""" I believe the way it is executing is that: create_table is being called from setup

BigQuery apply rank / percent_rank to column with a WHERE clause

阅读更多关于 BigQuery apply rank / percent_rank to column with a WHERE clause

问题 I have a fairly wide bigquery table with ~20-30 different columns, each of which needs to receive a complementary percentile column, that shows the column's percentile value compared to all other rows in the table. However, each of the columns should only receive a percentile value if the value in another column meets a certain threshold. To showcase this, I created a reproducible example below: WITH correct_games_played AS ( SELECT "a" as name, 7 as num1, 0.4 as num2, 0.55 as num3 UNION ALL

spliting dataset for training and evaluation in Bigquery ML

阅读更多关于 spliting dataset for training and evaluation in Bigquery ML

问题 Does the BigQuery ML automatically split the dataset for training and evaluation? Or do we have to get manually 80% datset for training, 10% for validation and 10% for evaluation with logistic Regression BigQuery ML? If both are affirmative, which of these would be better? Thanks 回答1: Yes, BigQuery ML will automatically split data for it's validation processes. It would also be fairly common practice for you to manually split a holdout set to perform some additional validation on data that