google-bigquery

Delete Oldest Duplicate Rows from a BigQuery Table

我怕爱的太早我们不能终老 提交于 2020-07-22 10:11:07
问题 I have a table with >70M rows of data and 2M of duplicates. I want to clean duplicates by keeping the recent original row. I found a few solutions from here - link In which, solutions are only to clean the duplicates and not retain the recent data among the duplicates. here is another common solution: ;WITH cte AS (SELECT Row_number() OVER (partition BY id ORDER BY updatedAt DESC, status DESC) RN FROM MainTable) DELETE FROM cte WHERE RN > 1 But it is not supported in BigQuery. 回答1: Here is

BigQuery integration in Firebase repeatedly returns 409

北慕城南 提交于 2020-07-22 06:00:07
问题 I have a Firebase project set up that continually exports its data to BigQuery (using the standard UI integration) - automatically creating a new events_intraday_* table each day. However, Firebase keeps trying to create the table during the day - even though it's already been created - which results in status code 409. Could this be a role/permission issue? My Firebase service account in BQ only has the standard Editor role. 回答1: I contacted Firebase support and got the following response:

Analyze Firebase data

两盒软妹~` 提交于 2020-07-21 06:29:05
问题 I have a mobile app that uses Firebase to store it's data. I am storing all user data, different business objects and relationships. I am looking for a way to analyze my data. I want to execute queries and aggregations on the data, and to generate reports. The Firebase site mentioned using BigQuery from Google, but there seems to be no easy way to import data from Firebase to it. What is the best way to achieve this? I know I can create daily backups, but after I have the raw JSON data how

How to Remove Diacritic Marks (such as Accents) using Unicode Normalization in Standard SQL?

半腔热情 提交于 2020-07-18 05:39:37
问题 How can we remove diacritic marks from strings in BigQuery using the new normalize function such as: café To result: cafe 回答1: The Short Answer It's actually quite simple after you understand what normalize is doing: WITH data AS( SELECT 'Ãâíüçãõ' AS text ) SELECT REGEXP_REPLACE(NORMALIZE(text, NFD), r'\pM', '') nfd_result, REGEXP_REPLACE(NORMALIZE(text, NFKD), r'\pM', '') nfkd_result FROM data Results: Row nfd_result nfkd_result 1 Aaiucao Aaiucao You can use either the options "NFD" or "NFKD

How to Remove Diacritic Marks (such as Accents) using Unicode Normalization in Standard SQL?

混江龙づ霸主 提交于 2020-07-18 05:38:50
问题 How can we remove diacritic marks from strings in BigQuery using the new normalize function such as: café To result: cafe 回答1: The Short Answer It's actually quite simple after you understand what normalize is doing: WITH data AS( SELECT 'Ãâíüçãõ' AS text ) SELECT REGEXP_REPLACE(NORMALIZE(text, NFD), r'\pM', '') nfd_result, REGEXP_REPLACE(NORMALIZE(text, NFKD), r'\pM', '') nfkd_result FROM data Results: Row nfd_result nfkd_result 1 Aaiucao Aaiucao You can use either the options "NFD" or "NFKD

How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time?

喜欢而已 提交于 2020-07-15 09:43:51
问题 The biggest chunk of my BigQuery billing comes from query consumption. I am trying to optimize this by understanding which datasets/tables consume the most. I am therefore looking for a way to track my BigQuery usage, but ideally something that is more in realtime (that I don't have to wait a day before I get the final results). The best way would be for instance how much each table/dataset consumed in the last hour. So far I managed to find the Dashboard Monitoring but this only allows to

How to consolidate two id columns, identifying which rows belong to same set of related IDs

假装没事ソ 提交于 2020-07-14 11:20:27
问题 I have 2 ID columns that are created/collected independently. I'm trying to consolidate these two ID columns into one by determining which rows are part of the same related group of ids based on either of the two ID columns. I would consider the rows to be related based on a few rules: 1: If a LOAN has the same value in multiple rows, they belong to the same group (in the example for reference only.) I've called it loan_group. No issues here. 2: If a COLLATERAL has the same value in multiple

How to consolidate two id columns, identifying which rows belong to same set of related IDs

僤鯓⒐⒋嵵緔 提交于 2020-07-14 11:18:05
问题 I have 2 ID columns that are created/collected independently. I'm trying to consolidate these two ID columns into one by determining which rows are part of the same related group of ids based on either of the two ID columns. I would consider the rows to be related based on a few rules: 1: If a LOAN has the same value in multiple rows, they belong to the same group (in the example for reference only.) I've called it loan_group. No issues here. 2: If a COLLATERAL has the same value in multiple

How to link child and parent ids using BigQuery

久未见 提交于 2020-07-13 12:54:47
问题 Let's say I have the following table: child_id parent_id 1 2 2 3 3 - 4 5 5 6 6 - 7 8 and I want to create the following table: child_id parent_id branch_id 1 2 1 2 3 1 3 - 1 4 5 2 5 6 2 6 - 2 7 8 3 in which the branch_id denotes groupings that are linked together by the parent_ids. However, the row order is not guaranteed and branches may contain hundreds of rows . This rules out a simple use of the LAG() function. How can I achieve this given the limitations of BigQuery's SQL? 回答1: Below

Creating routine using google big query client returns 'callback' is not a function error

放肆的年华 提交于 2020-07-11 04:19:09
问题 Using google biq query client i am trying to run create routine function but it is failing with callback is not a function error const { BigQuery } = require('@google-cloud/bigquery') const projectId = 'bigqueryproject1-279307' const keyFilename = '../credentials/client_secrets.json' const bigqueryClient = new BigQuery({ projectId, keyFilename }) const dataset = bigqueryClient.dataset('babynames') const routine = dataset.routine('analysis_routine') async function createRoutine () { const