google-bigquery

How to deal with semantic version data type in BigQuery

↘锁芯ラ 提交于 2021-02-02 09:59:39
问题 I know that there is no data type in BigQuery. What would you prefer to deal with semantic versions in BigQuery? I have the following schema: software:string, software_version:string software_version column is string but the data I store there is in semver format: `MAJOR.MINOR.PATCH-prerelease I especially want to perform operators < > = . select '4.0.0' < '4.0.0-beta' This returns true , but according to the semver definition it's false. Because the char - is used for prerelease. 回答1: Below

BigQuery Scheduled Data Transfer throws “Incompatible table partitioning specification.” Error - but error message is truncated

强颜欢笑 提交于 2021-01-29 20:06:18
问题 I'm using the new BQ Data Transfer UI and upon scheduling a Data Transfer, the transfer fails. The error message in Run History isn't terribly helpful as the error message seems truncated. Incompatible table partitioning specification. Expects partitioning specification interval(type:hour), but input partitioning specification is ; JobID: xxxxxxxxxxxx Note the part of the error that says..."but input partition specification is..." with nothing before the semicolon. Seems this error is

How to convert UTC time to local timezones based on timezone column in bigquery?

谁说胖子不能爱 提交于 2021-01-29 17:29:33
问题 I've been trying to convert each UTC time back to the appropriate local timezone using standard SQL in GBQ, but couldn't find a good way to do it dynamically because I might have tons of different timezone name within the database. I'm wondering if anyone has an idea? The table I have contains 2 different columns (see screenshot) 回答1: Below example is for BigQuery Standard SQL #standardSQL WITH `project.dataset.yourtable` AS ( SELECT 'Pacific/Honolulu' timezone, TIMESTAMP '2020-03-01 03:41:27

Return the highest count record

我的未来我决定 提交于 2021-01-29 16:43:33
问题 The data I am working on looks like below- A_ID B_ID count 123 abcd 1000 123 aaaa 2000 123 aaaa 3000 456 null 50 456 bbbb 6000 456 cccc 450 I want to be able to extract the B_id that has the highest count for a given A_id The result should look like- A_ID B_ID count 123 aaaa 3000 456 bbbb 6000 How to achieve this result? 回答1: One option is to filter with a subquery: select t.* from mytable t where t.count = (select max(t1.count) from mytable t1 where t1.a_id = t.a_id) You can also use window

IF Conditional to Run Schedule Query

て烟熏妆下的殇ゞ 提交于 2021-01-29 15:59:22
问题 I'm using BigQuery. I have a query-scheduler to generate a table (RESULT TABLE) that depends on another table (SOURCE TABLE). The case is, this source table doesn't always have data, there's a possibility that this source table is empty. I want to Schedule the Query to make the RESULT TABLE only if there's data in SOURCE TABLE. The example would be: IF COUNT(1) FROM data.source_table > 0 THEN RUN: SELECT * FROM data.source_table LEFT JOIN data.other_source_table ELSE [Don't Run] Thanks in

BigQuery Scheduled Data Transfer throws “Incompatible table partitioning specification.” Error - but error message is truncated

99封情书 提交于 2021-01-29 15:55:04
问题 I'm using the new BQ Data Transfer UI and upon scheduling a Data Transfer, the transfer fails. The error message in Run History isn't terribly helpful as the error message seems truncated. Incompatible table partitioning specification. Expects partitioning specification interval(type:hour), but input partitioning specification is ; JobID: xxxxxxxxxxxx Note the part of the error that says..."but input partition specification is..." with nothing before the semicolon. Seems this error is

Accessing NOAA Data via BigQuery

自闭症网瘾萝莉.ら 提交于 2021-01-29 15:53:38
问题 I am trying to access the NOAA Data via BigQuery. I used the following code to achieve the same : import os from google.oauth2 import service_account credentials = service_account.Credentials.from_service_account_file("my-json-file-path-with-filename.json") from google.cloud import bigquery # Create a "Client" object client = bigquery.Client(credentials=credentials) But getting the following error : DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE

PyTest teardown_class is being run too soon

☆樱花仙子☆ 提交于 2021-01-29 14:23:57
问题 The Python "teardown_class" is not behaving as I expect it to. Below is a summary of my code: @classmethod def setup_class(cls): cls.create_table(table1) cls.create_table(table2) cls.create_table(table3) @classmethod def create_table(cls, some_arg_here): """Some code here that creates the table""" def test_foo(self): """Some test code here""" @classmethod def teardown_class(cls): """Perform teardown things""" I believe the way it is executing is that: create_table is being called from setup

BigQuery apply rank / percent_rank to column with a WHERE clause

喜欢而已 提交于 2021-01-29 13:48:04
问题 I have a fairly wide bigquery table with ~20-30 different columns, each of which needs to receive a complementary percentile column, that shows the column's percentile value compared to all other rows in the table. However, each of the columns should only receive a percentile value if the value in another column meets a certain threshold. To showcase this, I created a reproducible example below: WITH correct_games_played AS ( SELECT "a" as name, 7 as num1, 0.4 as num2, 0.55 as num3 UNION ALL

spliting dataset for training and evaluation in Bigquery ML

ε祈祈猫儿з 提交于 2021-01-29 13:36:44
问题 Does the BigQuery ML automatically split the dataset for training and evaluation? Or do we have to get manually 80% datset for training, 10% for validation and 10% for evaluation with logistic Regression BigQuery ML? If both are affirmative, which of these would be better? Thanks 回答1: Yes, BigQuery ML will automatically split data for it's validation processes. It would also be fairly common practice for you to manually split a holdout set to perform some additional validation on data that